Classify OTUs by sample using all sequences

I would like to classify my OTUs by sample (ie A2, A8…) but have an output that includes total number of sequences not just the unique sequences. I am a little confused on the output and the number of sequences that it is returning.

I originally tried:

mothur > classify.otu(list=stabilityall.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.list, count=stabilityall.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table, taxonomy=stabilityall.trim.contigs.good.unique.good.filter.unique.precluster.pick.silva.wang.pick.taxonomy, label=0.03)
reftaxonomy is not required, but if given will keep the rankIDs in the summary file static.
0.03 678313

Output File Names:
stabilityall.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.0.03.cons.taxonomy
stabilityall.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.0.03.cons.tax.summary

I am not sure what the 678313 refers to ? This number is used in the summary file – as what appears to be the total number of sequences for all samples (6 samples) combined. The summary file is divided by sample, but the paired taxonomy file is not split by sample. However if I add up the number (in column designated as “size”) in my taxonomy file – it is 8580720 which is my total number of sequences (as indicated by summary.seqs prior to cluster.split). My total number of unique sequences was 1269398 prior to cluster.split command.

I want to know how many of my total number of sequences classify to OTUs at 97% similarity within each group. I want to compare across groups – to see if OTU 121 is shared among all samples or just some samples.

I have also tried make.shared(list=stabilityall.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.list, count=stabilityall.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table, label=0.03)
and then tried to classify using:
mothur > classify.otu(list=stabilityall.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.shared, count=stabilityall.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table, taxonomy=stabilityall.trim.contigs.good.unique.good.filter.unique.precluster.pick.silva.wang.pick.taxonomy, label=0.03)
reftaxonomy is not required, but if given will keep the rankIDs in the summary file static.
0.03 0

Output File Names:
stabilityall.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.0.03.cons.taxonomy
stabilityall.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.0.03.cons.tax.summary

Files were empty and contained no information.

Lastly, I have tried variations on

classify.otu(list=stabilityall.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.list, count=stabilityall.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table, taxonomy=stabilityall.trim.contigs.good.unique.good.filter.unique.precluster.pick.silva.wang.pick.taxonomy, basis=sequence, persample=true, label=0.03)

classify.otu(list=stabilityall.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.list, count=stabilityall.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table, taxonomy=stabilityall.trim.contigs.good.unique.good.filter.unique.precluster.pick.silva.wang.pick.taxonomy, persample=true, label=0.03, reftaxonomy=silva.bacteria.silva.tax)

The numbers in the taxonomy files (designated as “size”) for each sample are quite large ~4 million when they should be ~1-1.5 million.

I’m pretty sure 678313 is the number of OTUs you have.

Pat

Is it possible to have the taxonomy file (stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.an.unique_list.0.03.cons.taxonomy) split by sample? My paired summary file is split by sample.

classify.otu(list=stabilityall.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.list, count=stabilityall.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table, taxonomy=stabilityall.trim.contigs.good.unique.good.filter.unique.precluster.pick.silva.wang.pick.taxonomy, label=0.03, reftaxonomy=silva.bacteria.silva.tax)

You could run get.groups for each of the groups you want - you’ll have to change the output file names so they don’t overwrite each other.

http://www.mothur.org/wiki/Get.groups

Hi Pat and all other users,

some months ago I had the same problem with getting the right classify.otu output for my data. With applying the get.groups command I got the number of unique sequences per OTU per sample. However, because I am interested in the assignment of the total number of sequences I just worked with the classify.seqs command so far. But in this case, the sequences were not clustered. How can I get the total number of sequences using classify.otu?

Am I right with using the basis option? e.g. mothur > classify.otu(taxonomy=abrecovery.silva.full.taxonomy, list=abrecovery.fn.list, basis=sequence)?

Thanks, one more time :wink:
Cheers, Kristin

that should work