【繪圖進階】之配對箱體圖繪制(七)

發(fā)稿時間：2020-07-29來源：天昊生物

前面基礎課程中介紹過箱體圖的繪制，本期介紹成對樣品的箱體圖繪制，如果您有成對樣品的數(shù)據(jù)，這方面的學習可千萬不要錯過哦。本期課程從數(shù)據(jù)處理和數(shù)據(jù)繪圖兩個方面介紹，同時增加更細致的文字描述，如果您仍有不理解的地方，留言給我們，我們會及時的回復。

1. 豐度數(shù)據(jù)和分組信息讀取

未處理的數(shù)據(jù)讀取，以data表示；分組信息以info表示。

In [1]:

data = read.table('phylum.taxon.Abundance.xls',header = T,sep = 't',row.names = 1)
data

Out[1]:

In [2]:

info = read.table('sample_groups.xls',header = F,sep = 't',col.names = c('sample','group'))
info

Out[2]:

2. 提取數(shù)據(jù)框

通過grep獲取All和Verrucomicrobia的行號

In [3]:

grep("All|Verrucomicrobia", rownames(data))

Out[3]:

10  12

處理后的數(shù)據(jù)以數(shù)據(jù)框df表示。- grep("All|Verrucomicrobia", rownames(data)) 用于提取非All和Verrucomicrobia的行，info$sample提取樣品列

In [4]:

df = data[-grep("All|Unassigned", rownames(data)),info$sample]
df

Out[4]:

3. 求兩組的均值、方差和標準差；并計算成對樣品T檢驗P值。

In [5]:



df['H_mean'] = apply(df[,info[info$group=='H',1]],1,mean)
df['H_var'] = apply(df[,info[info$group=='H',1]],1,var)
df['H_sd'] = apply(df[,info[info$group=='H',1]],1,sd)

df['L_mean'] = apply(df[,info[info$group=='L',1]],1,mean)
df['L_var'] = apply(df[,info[info$group=='L',1]],1,var)
df['L_sd'] = apply(df[,info[info$group=='L',1]],1,sd)

df['p.value'] = apply(df,1,function(x){ t.test(as.numeric(x[info[info$group=='H',1]]),as.numeric(x[info[info$group=='L',1]]),paired = T)$p.value })

df

Out[5]:

數(shù)據(jù)保存至文本文件phylum.taxon.Abundance.new.xls 中。

In [6]:

write.table(df,'phylum.taxon.Abundance.new.xls',sep = 't',row.names = T, col.names = NA,quote = F)

4. 提取P值顯著的數(shù)據(jù)

In [7]:

df_diff = df[df$p.value<0.05,info$sample]
df_diff

Out[7]:

1. 數(shù)據(jù)合并

In [8]:

df_diff = data.frame(t(df_diff),group = info$group) 
df_diff

Out[8]:

2. 加載R包

In [9]:


.libPaths("C:/Program Files/R/R-3.6.1/library")
library(ggpubr)
3. 繪圖

method包括：t.test、wilcox.test、anova、kruskal.test；本次分析采用t.test。

ggarrange可以將多個圖繪制在一張圖上，ncol=2 按照每行2張圖繪制，行不限制。

In [10]:


p1= ggpaired(df_diff, x="group", y="Bacteroidetes", color = "group", line.color = "gray",xlab = 'Bacteroidetes',
             line.size = 0.4, palette = "npg")+ stat_compare_means(method = "t.test",paired = TRUE)

p2= ggpaired(df_diff, x="group", y="Proteobacteria", color = "group", line.color = "gray", xlab = 'Proteobacteria',
             line.size = 0.4, palette = "npg")+ stat_compare_means(method = "t.test",paired = TRUE)
ggarrange(p1, p2,  ncol = 2,common.legend = T)

Out[10]: