I would like to classify my OTUs by sample (ie A2, A8…) but have an output that includes total number of sequences not just the unique sequences. I am a little confused on the output and the number of sequences that it is returning.
I originally tried:
mothur > classify.otu(list=stabilityall.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.list, count=stabilityall.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table, taxonomy=stabilityall.trim.contigs.good.unique.good.filter.unique.precluster.pick.silva.wang.pick.taxonomy, label=0.03)
reftaxonomy is not required, but if given will keep the rankIDs in the summary file static.
0.03 678313
Output File Names:
stabilityall.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.0.03.cons.taxonomy
stabilityall.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.0.03.cons.tax.summary
I am not sure what the 678313 refers to ? This number is used in the summary file – as what appears to be the total number of sequences for all samples (6 samples) combined. The summary file is divided by sample, but the paired taxonomy file is not split by sample. However if I add up the number (in column designated as “size”) in my taxonomy file – it is 8580720 which is my total number of sequences (as indicated by summary.seqs prior to cluster.split). My total number of unique sequences was 1269398 prior to cluster.split command.
I want to know how many of my total number of sequences classify to OTUs at 97% similarity within each group. I want to compare across groups – to see if OTU 121 is shared among all samples or just some samples.
I have also tried make.shared(list=stabilityall.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.list, count=stabilityall.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table, label=0.03)
and then tried to classify using:
mothur > classify.otu(list=stabilityall.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.shared, count=stabilityall.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table, taxonomy=stabilityall.trim.contigs.good.unique.good.filter.unique.precluster.pick.silva.wang.pick.taxonomy, label=0.03)
reftaxonomy is not required, but if given will keep the rankIDs in the summary file static.
0.03 0
Output File Names:
stabilityall.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.0.03.cons.taxonomy
stabilityall.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.0.03.cons.tax.summary
Files were empty and contained no information.
Lastly, I have tried variations on
classify.otu(list=stabilityall.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.list, count=stabilityall.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table, taxonomy=stabilityall.trim.contigs.good.unique.good.filter.unique.precluster.pick.silva.wang.pick.taxonomy, basis=sequence, persample=true, label=0.03)
classify.otu(list=stabilityall.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.list, count=stabilityall.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table, taxonomy=stabilityall.trim.contigs.good.unique.good.filter.unique.precluster.pick.silva.wang.pick.taxonomy, persample=true, label=0.03, reftaxonomy=silva.bacteria.silva.tax)
The numbers in the taxonomy files (designated as “size”) for each sample are quite large ~4 million when they should be ~1-1.5 million.