Relative abundance of OTUs (i.e. .tax.summary) for more curated data?

chrisjw · June 3, 2015, 9:17am

Hi,

If using the classify.seqs command with relabund=T the file ‘stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.nr_v119.wang.tax.summary’ that is produced, gives an excellent break down of the relative abundance of OTUs per sample at each taxonomic level.

However, following the command classify.seqs, I have then continued through the MiSeqSOP with the commands:

mothur > remove.lineage(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.count_table, taxonomy=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.silva.wang.taxonomy, taxon=Chloroplast-Mitochondria-unknown-Archaea-Eukaryota)

mothur > cluster.split(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta,count=stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table,taxonomy=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.silva.wang.pick.taxonomy, splitmethod=classify, taxlevel=4, cutoff=0.15)

mothur > make.shared(list=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.list,count=stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table, label=0.03, processors=8)

#get the consensus taxonomy for each OTU using:
mothur > classify.otu(list=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.list,count=stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table,taxonomy=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.silva.wang.pick.taxonomy, label=0.03)

From classify.otu we get our updated *.tax.summary - but there doesn’t appear to be an option to have this expressed as relative abundance.

What I would really like to get is a breakdown of the relative abundance of OTUs per sample as in the file *.wang.tax.summay produced from classify.seqs, but based on the more processed data, i.e. with undesirable domains removed, e.t.c, at the level of the outputs produced by classify.otu().

Is there a way to produce such a good summary of the relative abundance of OTUs that we get from classify.seqs but at the level of output that we get from classify.otu??

Many thanks,

Chris.

westcott · June 3, 2015, 1:04pm

There currently isn’t the relabund option in classify.otu, but thanks for the feature request. As a workaround, you could use the summary.tax command with some manual modifications to the *.cons.taxonomy file.

OTU Size Taxonomy
Otu0001 12328 Bacteria(100);“Bacteroidetes”(100);“Bacteroidia”(100);“Bacteroidales”(100);“Porphyromonadaceae”(100);unclassified(100);
Otu0002 8914 Bacteria(100);“Bacteroidetes”(100);“Bacteroidia”(100);“Bacteroidales”(100);“Porphyromonadaceae”(100);unclassified(100);
Otu0003 7850 Bacteria(100);“Bacteroidetes”(100);“Bacteroidia”(100);“Bacteroidales”(100);“Porphyromonadaceae”(100);unclassified(100);
…

The first two columns are basically a count file, and the first and third columns are a taxonomy file. If you created 2 new files from this one file, you could run:

summary.tax(taxonomy=yourModTaxFile, count=yourModCountFile, relabund=t)

chrisjw · June 3, 2015, 1:54pm

Hi,

Thanks for the quick reply.

Your advice worked perfectly to produce a tax.summary file in the right format, with relative abundance. However, in contrast to the *cons.tax.summary it just has a ‘total’ column as opposed to a column per sample.

I see that there is the argument in summary.tax() for a ‘group’ file. In other commands I have been using a .design file, that looks like…

L1_AAGCGT lower
L1_ACATCG upper
L1_ACCCGT lower
L1_ATTGGC upper
L1_CACTGT upper
L1_CGTGAT upper
L1_CTGATC middle
L1_GAACGA middle
L1_GAATGT middle
L1_GATCTG upper
L1_GCAGTA lower
L1_GCGATT lower
L1_GCTTAC lower
L1_GTAGCC middle
L1_TACAAG middle
L1_TACGGA lower
L1_TCAAGT middle
L1_TGGTCA upper

Is it possible to use/adapt this to the group argument, so that I end up with either a column per group (left column above) or per treatment (right column)??

Many thanks for your help.

Chris.

chrisjw · June 3, 2015, 5:38pm

Further to my above…

I had a look at group files and attempted to make one myself in R using the ‘stability.an.shared’ file and the code below:

df1<-read.table("stability.an.shared", header=T)

groups<-levels(df1$Group)

for(i in 1:length(groups)){
x<-df1[df1$Group==groups_,]
x<-x[,4:ncol(x)]
x<-t(as.data.frame(x))
x<-as.data.frame(x)
x$otu<-row.names(x)
x[x[,1]==0,]<-NA
x<-as.data.frame(x[!is.na(x[,1]),])
x<-x[,c(2,1)]
colnames(x)[2]<-“group”
x$group<-paste(groups_)
nam<-paste(groups_)
assign(nam,x, envir=.GlobalEnv)
print(paste("loop is ",round(i/length(groups)*100), “% done”))
}

y<-lapply(groups,get)
z<-do.call(rbind,y)

write.table(z, file=“otu.groups”, quote=F, row.names=F, col.names=F, sep="\t")

This gave me a group file with one column of OTU numbers, and the second with a group name if that OTU occurred in that group (i.e. removed any OTU names per group that had a zero score in the stability.an.shared file).

I then ran the summary.tax command with the modified .taxonomy file you advised to make in the previous post, and the group file made above with:

mothur> summary.tax(taxonomy=mod.taxonomy, group=otu.groups, relabund=T)

During the running of the command, mothur repeatedly spat out the following…

Your groupfile contains more than 1 sequence named Otu000001, sequence names must be unique. Please correct.

…probably for all OTUs that were present in all groups, however, mothur also produced a *.tax.summary file which seems to have the relative abundance data, for all of my groups.

I then uploaded this into R, subsetted out the phyla level of taxonomy and made an averages table for treatments across my groups. When comparing this to the averages table I made from the output of classify.otu *.tax.summary file (which i converted the phyla level data into relative abundance), the data are different in terms of relative abundances per phyla, and actually a couple of different phyla, with several zeros across all groups in the newly made *tax.summary.

I guess this is something to do with the groups file I made.

Any advice on the issue??

Cheers,

Chris.___

Topic		Replies	Views
Relative abundance of OTUs in *.cons.tax.summary file Theory behind mothur	4	3225	July 2, 2016
Summary.tax and relative abundance per sample Commands in mothur	2	28	July 28, 2024
report on all seqs Commands in mothur	2	2268	May 14, 2013
relative abundance in original samples Commands in mothur	2	3357	December 6, 2010
Identifying OTU relabund file Commands in mothur	6	3361	June 2, 2015

Relative abundance of OTUs (i.e. .tax.summary) for more curated data?

Related topics