Relative abundance of OTUs (i.e. .tax.summary) for more curated data?


If using the classify.seqs command with relabund=T the file ‘’ that is produced, gives an excellent break down of the relative abundance of OTUs per sample at each taxonomic level.

However, following the command classify.seqs, I have then continued through the MiSeqSOP with the commands:

mothur > remove.lineage(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.count_table,, taxon=Chloroplast-Mitochondria-unknown-Archaea-Eukaryota)

mothur > cluster.split(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta,count=stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table,, splitmethod=classify, taxlevel=4, cutoff=0.15)

mothur > make.shared(,count=stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table, label=0.03, processors=8)

#get the consensus taxonomy for each OTU using:
mothur > classify.otu(,count=stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table,, label=0.03)

From classify.otu we get our updated *.tax.summary - but there doesn’t appear to be an option to have this expressed as relative abundance.

What I would really like to get is a breakdown of the relative abundance of OTUs per sample as in the file * produced from classify.seqs, but based on the more processed data, i.e. with undesirable domains removed, e.t.c, at the level of the outputs produced by classify.otu().

Is there a way to produce such a good summary of the relative abundance of OTUs that we get from classify.seqs but at the level of output that we get from classify.otu??

Many thanks,


There currently isn’t the relabund option in classify.otu, but thanks for the feature request. As a workaround, you could use the command with some manual modifications to the *.cons.taxonomy file.

OTU Size Taxonomy
Otu0001 12328 Bacteria(100);“Bacteroidetes”(100);“Bacteroidia”(100);“Bacteroidales”(100);“Porphyromonadaceae”(100);unclassified(100);
Otu0002 8914 Bacteria(100);“Bacteroidetes”(100);“Bacteroidia”(100);“Bacteroidales”(100);“Porphyromonadaceae”(100);unclassified(100);
Otu0003 7850 Bacteria(100);“Bacteroidetes”(100);“Bacteroidia”(100);“Bacteroidales”(100);“Porphyromonadaceae”(100);unclassified(100);

The first two columns are basically a count file, and the first and third columns are a taxonomy file. If you created 2 new files from this one file, you could run:, count=yourModCountFile, relabund=t)


Thanks for the quick reply.

Your advice worked perfectly to produce a tax.summary file in the right format, with relative abundance. However, in contrast to the * it just has a ‘total’ column as opposed to a column per sample.

I see that there is the argument in for a ‘group’ file. In other commands I have been using a .design file, that looks like…

L1_AAGCGT lower
L1_ACATCG upper
L1_ACCCGT lower
L1_ATTGGC upper
L1_CACTGT upper
L1_CGTGAT upper
L1_CTGATC middle
L1_GAACGA middle
L1_GAATGT middle
L1_GATCTG upper
L1_GCAGTA lower
L1_GCGATT lower
L1_GCTTAC lower
L1_GTAGCC middle
L1_TACAAG middle
L1_TACGGA lower
L1_TCAAGT middle
L1_TGGTCA upper

Is it possible to use/adapt this to the group argument, so that I end up with either a column per group (left column above) or per treatment (right column)??

Many thanks for your help.


Further to my above…

I had a look at group files and attempted to make one myself in R using the ‘’ file and the code below:

df1<-read.table("", header=T)


for(i in 1:length(groups)){
assign(nam,x, envir=.GlobalEnv)
print(paste("loop is ",round(i/length(groups)*100), “% done”))


write.table(z, file=“otu.groups”, quote=F, row.names=F, col.names=F, sep="\t")

This gave me a group file with one column of OTU numbers, and the second with a group name if that OTU occurred in that group (i.e. removed any OTU names per group that had a zero score in the file).

I then ran the command with the modified .taxonomy file you advised to make in the previous post, and the group file made above with:

mothur>, group=otu.groups, relabund=T)

During the running of the command, mothur repeatedly spat out the following…

Your groupfile contains more than 1 sequence named Otu000001, sequence names must be unique. Please correct.

…probably for all OTUs that were present in all groups, however, mothur also produced a *.tax.summary file which seems to have the relative abundance data, for all of my groups.

I then uploaded this into R, subsetted out the phyla level of taxonomy and made an averages table for treatments across my groups. When comparing this to the averages table I made from the output of classify.otu *.tax.summary file (which i converted the phyla level data into relative abundance), the data are different in terms of relative abundances per phyla, and actually a couple of different phyla, with several zeros across all groups in the newly made *tax.summary.

I guess this is something to do with the groups file I made.

Any advice on the issue??