天昊生信團(tuán)自4月份以來(lái),陸續(xù)推出一系列的R基礎(chǔ)教程,很多老師也展開(kāi)R的學(xué)習(xí)。目前收到部分老師反饋如何對(duì)一組數(shù)據(jù)進(jìn)行列求和。之所以提出這個(gè)問(wèn)題,是因?yàn)閑xcel數(shù)據(jù)中樣品是無(wú)規(guī)則排列,即使行排序也不能將同一組的樣品靠在一起;或者樣品數(shù)量較多,分組較多,需要多次輸入求和函數(shù)。這里,我們提供了一個(gè)R的統(tǒng)計(jì)教程,幫您解決這方面的問(wèn)題。
一讀取數(shù)據(jù)
1.讀取絕對(duì)豐度數(shù)據(jù)
In [1]:
df <- read.table('genus.taxon.Abundance.xls',header = T, row.names = 1, sep = 't',check.names = FALSE,quote = '',fill = T)
head(df,10)Out[1]:
C-1 C-2 C-3 C-4 C-5 C-6 C-7 M-1 M-2 M-3 ... M-5 M-6 M-7 M-8 Abundance superkingdom phylum class order family
No_Rank 32729 43885 36532 29335 21815 26879 41869 31030 29862 33787 ... 43164 30980 21853 21792 471140 - - - - -
Lactobacillus 24438 24449 3760 18204 28795 4537 12831 24450 17354 26211 ... 30850 19648 21548 18263 291188 Bacteria Firmicutes Bacilli Lactobacillales Lactobacillaceae
Unassigned 25317 21725 13405 19347 14697 16676 19162 15819 22083 21072 ... 12032 7750 33000 26360 279560 Bacteria Bacteroidetes Bacteroidia Bacteroidales Porphyromonadaceae
Alistipes 5720 6196 18812 14525 19759 21996 10696 4411 5670 3663 ... 5744 6580 5415 5571 146280 Bacteria Bacteroidetes Bacteroidia Bacteroidales Rikenellaceae
Parabacteroides 6092 2683 8164 10246 10619 8742 7885 4493 12209 2423 ... 5628 4943 6382 11633 118324 Bacteria Bacteroidetes Bacteroidia Bacteroidales Porphyromonadaceae
Bacteroides 9803 3257 7719 5919 8039 6959 3502 6514 8694 3840 ... 7839 4368 11986 15711 116935 Bacteria Bacteroidetes Bacteroidia Bacteroidales Bacteroidaceae
Barnesiella 2982 1414 2341 9184 3707 6875 3812 3183 2773 5250 ... 3128 6584 3764 6680 64267 Bacteria Bacteroidetes Bacteroidia Bacteroidales Porphyromonadaceae
Eisenbergiella 680 2283 4357 1743 1251 6988 3731 1832 2204 1598 ... 1193 649 3466 4671 45890 Bacteria Firmicutes Clostridia Clostridiales Lachnospiraceae
Oscillibacter 653 714 4704 1569 2210 4680 4303 2748 4912 800 ... 2424 729 1287 2486 37738 Bacteria Firmicutes Clostridia Clostridiales Ruminococcaceae
Turicibacter 1373 2366 20 1405 56 6 2091 5060 9 9229 ... 1 1 4360 1693 27673 Bacteria Firmicutes Erysipelotrichia Erysipelotrichales Erysipelotrichaceae
In [2]:
info = read.table('sample_groups.xls',header = F,col.names = c('sample','group'))head(info)Out[2]:
sample group
C-1 Control
C-2 Control
C-3 Control
C-4 Control
C-5 Control
C-6 Control
In [3]:
info$sampleOut[3]:
C-1 C-2 C-3 C-4 C-5 C-6 C-7 M-1 M-2 M-3 M-4 M-5 M-6 M-7 M-8
In [4]:
df = df[as.character(info$sample)]head(df)Out[4]:
C-1 C-2 C-3 C-4 C-5 C-6 C-7 M-1 M-2 M-3 M-4 M-5 M-6 M-7 M-8
No_Rank 32729 43885 36532 29335 21815 26879 41869 31030 29862 33787 25628 43164 30980 21853 21792
Lactobacillus 24438 24449 3760 18204 28795 4537 12831 24450 17354 26211 15850 30850 19648 21548 18263
Unassigned 25317 21725 13405 19347 14697 16676 19162 15819 22083 21072 11115 12032 7750 33000 26360
Alistipes 5720 6196 18812 14525 19759 21996 10696 4411 5670 3663 11522 5744 6580 5415 5571
Parabacteroides 6092 2683 8164 10246 10619 8742 7885 4493 12209 2423 16182 5628 4943 6382 11633
Bacteroides 9803 3257 7719 5919 8039 6959 3502 6514 8694 3840 12785 7839 4368 11986 15711
In [5]:
info$groupOut[5]:
Control Control Control Control Control Control Control NNK-BaP NNK-BaP NNK-BaP NNK-BaP NNK-BaP NNK-BaP NNK-BaP NNK-BaP分組信息發(fā)現(xiàn),僅有2個(gè)分組,組名分別是Control、NNK-BaP
二計(jì)算該組的豐度之和、均值、方差、標(biāo)準(zhǔn)差
查看Control組包含哪些樣品
In [6]:
info[info$group=='Control',1]
Out[6]:
C-1 C-2 C-3 C-4 C-5 C-6 C-7
apply函數(shù)計(jì)算Control組的和、均值、方差、標(biāo)準(zhǔn)差
In [7]:
df['Control_sum'] = apply(df[as.character(info[info$group=='Control',1])],1,sum)
df['Control_mean'] = apply(df[as.character(info[info$group=='Control',1])],1,mean)
df['Control_var'] = apply(df[as.character(info[info$group=='Control',1])],1,var)
df['Control_sd'] = apply(df[as.character(info[info$group=='Control',1])],1,sd)
head(df)
Out[7]:
C-1 C-2 C-3 C-4 C-5 C-6 C-7 M-1 M-2 M-3 M-4 M-5 M-6 M-7 M-8 Control_sum Control_mean Control_var Control_sd
No_Rank 32729 43885 36532 29335 21815 26879 41869 31030 29862 33787 25628 43164 30980 21853 21792 233044 33292.000 64182849 8011.420
Lactobacillus 24438 24449 3760 18204 28795 4537 12831 24450 17354 26211 15850 30850 19648 21548 18263 117014 16716.286 99804027 9990.197
Unassigned 25317 21725 13405 19347 14697 16676 19162 15819 22083 21072 11115 12032 7750 33000 26360 130329 18618.429 16946400 4116.601
Alistipes 5720 6196 18812 14525 19759 21996 10696 4411 5670 3663 11522 5744 6580 5415 5571 97704 13957.714 43482964 6594.161
Parabacteroides 6092 2683 8164 10246 10619 8742 7885 4493 12209 2423 16182 5628 4943 6382 11633 54431 7775.857 7342272 2709.663
Bacteroides 9803 3257 7719 5919 8039 6959 3502 6514 8694 3840 12785 7839 4368 11986 15711 45198 6456.857 5800759 2408.477
apply函數(shù)計(jì)算NNK-BaP組的和、均值、方差、標(biāo)準(zhǔn)差
In [8]:
df['NNK-BaP_sum'] = apply(df[as.character(info[info$group=='NNK-BaP',1])],1,sum)
df['NNK-BaP_mean'] = apply(df[as.character(info[info$group=='NNK-BaP',1])],1,mean)
df['NNK-BaP_var'] = apply(df[as.character(info[info$group=='NNK-BaP',1])],1,var)
df['NNK-BaP_sd'] = apply(df[as.character(info[info$group=='NNK-BaP',1])],1,sd)
head(df)
Out[8]:
C-1 C-2 C-3 C-4 C-5 C-6 C-7 M-1 M-2 M-3 ... M-7 M-8 Control_sum Control_mean Control_var Control_sd NNK-BaP_sum NNK-BaP_mean NNK-BaP_var NNK-BaP_sd
No_Rank 32729 43885 36532 29335 21815 26879 41869 31030 29862 33787 ... 21853 21792 233044 33292.000 64182849 8011.420 238096 29762.000 48868388 6990.593
Lactobacillus 24438 24449 3760 18204 28795 4537 12831 24450 17354 26211 ... 21548 18263 117014 16716.286 99804027 9990.197 174174 21771.750 25821419 5081.478
Unassigned 25317 21725 13405 19347 14697 16676 19162 15819 22083 21072 ... 33000 26360 130329 18618.429 16946400 4116.601 149231 18653.875 72916680 8539.126
Alistipes 5720 6196 18812 14525 19759 21996 10696 4411 5670 3663 ... 5415 5571 97704 13957.714 43482964 6594.161 48576 6072.000 5639229 2374.706
Parabacteroides 6092 2683 8164 10246 10619 8742 7885 4493 12209 2423 ... 6382 11633 54431 7775.857 7342272 2709.663 63893 7986.625 22692800 4763.696
Bacteroides 9803 3257 7719 5919 8039 6959 3502 6514 8694 3840 ... 11986 15711 45198 6456.857 5800759 2408.477 71737 8967.125 17710582 4208.394
三按列計(jì)算相對(duì)豐度
In [9]:
df_Relative = apply(df,2,function(d) d/sum(d))
head(df_Relative)
Out[9]:
C-1 C-2 C-3 C-4 C-5 C-6 C-7 M-1 M-2 M-3 ... M-7 M-8 Control_sum Control_mean Control_var Control_sd NNK-BaP_sum NNK-BaP_mean NNK-BaP_var NNK-BaP_sd
No_Rank 0.13115102 0.17585513 0.14639033 0.11755065 0.08741665 0.10770901 0.16777666 0.12434282 0.11966244 0.135390620 ... 0.08756892 0.08732449 0.13340707 0.13340707 0.24318010 0.14420668 0.11926172 0.11926172 0.19457417 0.10579967
Lactobacillus 0.09792749 0.09797157 0.01506700 0.07294672 0.11538677 0.01818058 0.05141614 0.09797557 0.06954062 0.105032218 ... 0.08634673 0.07318314 0.06698518 0.06698518 0.37814391 0.17982493 0.08724334 0.08724334 0.10281045 0.07690602
Unassigned 0.10144980 0.08705600 0.05371626 0.07752693 0.05889354 0.06682375 0.07678560 0.06338959 0.08849058 0.084439315 ... 0.13223697 0.10562929 0.07460741 0.07460741 0.06420761 0.07409938 0.07474945 0.07474945 0.29032475 0.12923607
Alistipes 0.02292107 0.02482849 0.07538309 0.05820430 0.07917789 0.08814195 0.04286081 0.01767567 0.02272072 0.014678304 ... 0.02169888 0.02232400 0.05593109 0.05593109 0.16475105 0.11869583 0.02433160 0.02433160 0.02245313 0.03594017
Parabacteroides 0.02441175 0.01075127 0.03271462 0.04105758 0.04255225 0.03503078 0.03159662 0.01800426 0.04892367 0.009709399 ... 0.02557383 0.04661554 0.03115927 0.03115927 0.02781887 0.04877431 0.03200385 0.03200385 0.09035356 0.07209653
Bacteroides 0.03928239 0.01305139 0.03093143 0.02371850 0.03221373 0.02788597 0.01403315 0.02610278 0.03483843 0.015387575 ... 0.04803007 0.06295682 0.02587379 0.02587379 0.02197829 0.04335292 0.03593289 0.03593289 0.07051638 0.06369227
四寫(xiě)入文件進(jìn)行保存
In [10]:
write.csv(df_Relative,'genus.taxon.RelativeAbundance.cvs')
往期相關(guān)鏈接:
1、R基礎(chǔ)篇
【零基礎(chǔ)學(xué)繪圖】之繪制venn圖(五);
【零基礎(chǔ)學(xué)繪圖】之繪制barplot柱狀圖圖(四);
【零基礎(chǔ)學(xué)繪圖】之繪制heatmap圖(三);
【零基礎(chǔ)學(xué)繪圖】之a(chǎn)lpha指數(shù)箱體圖繪制(一);
2、R進(jìn)階
【繪圖進(jìn)階】之交互式可刪減分組和顯示樣品名的PCA 圖(三);
【進(jìn)階篇繪圖】之帶P值的箱體圖、小提琴圖繪制(一);
3、數(shù)據(jù)提交
3分鐘學(xué)會(huì)微生物多樣性云平臺(tái)數(shù)據(jù)分析;
3分鐘學(xué)會(huì)CHIP-seq類實(shí)驗(yàn)測(cè)序數(shù)據(jù)可視化 —IGV的使用手冊(cè);
10分鐘搞定多樣性數(shù)據(jù)提交,最快半天內(nèi)獲取登錄號(hào),史上最全的多樣性原始數(shù)據(jù)提交教程;
20分鐘搞定GEO上傳,史上最簡(jiǎn)單、最詳細(xì)的GEO數(shù)據(jù)上傳攻略;
4、表達(dá)譜分析
如何對(duì)GEO數(shù)據(jù)進(jìn)行差異分析;
5、醫(yī)學(xué)數(shù)據(jù)分析
【W(wǎng)GS服務(wù)升級(jí)】人工智能軟件SpliceAI助力解讀罕見(jiàn)和未確診疾病中的非編碼突變;
隱性疾病trio家系別忽視單親二倍體現(xiàn)象——天昊數(shù)據(jù)分析助力臨床疾病診斷新添UPD(單親二倍體)可視化分析工具;
【昊工具】Oh My God! 太好用了吧!疾病或表型的關(guān)鍵基因查詢數(shù)據(jù)庫(kù),我不允許你不知道Phenolyzer;
如果您對(duì)本文案介紹的方法或代碼有疑問(wèn),
請(qǐng)掃碼添加QQ群溝通
【本群將為大家提供】
分享生信分析方案
提供數(shù)據(jù)素材及分析軟件支持
定期開(kāi)展生信分析線上講座
QQ號(hào):1040471849