OTU numbering

I’m not sure if this is a bug, a feature, or a user error, so let’s see what you think:

Previously, when we were dealing with 454 data and using the 454 SOP, the end result included shared & taxonomy files where the OTU numbering corresponded with the size of the OTU (as in, number of reads in that OTU): Otu0001 was the largest, Otu0002 was the second large etc.

Now, we’ve switched to MiSeq data and the MiSeq SOP, with cluster.split instead of cluster, and the OTU numbering in the final files doesn’t go according to size anymore. It often seems quite random - I’m just looking at a dataset where the 7th largest Otu is Otu00099. Is this a feature that we’ll just have to learn to live with, or is it more likely that we’re doing something wrong that causes this?

*Edited to add and change: I’ve just run a smaller MiSeq dataset actually using cluster and not cluster.split, and the same thing happened: numbering is not according to size anymore, and I’m pretty sure it used to be. What gives?

What version of mothur are you using?

For these analyses, 1.33.3.

(I’ll be switching to the latest one for the next things that I do.)

Yeah, this was fixed a couple of versions ago. Regardless, update and you should be fine. Also, this isn’t a “bug”, it’s all cosmetic.

Thanks for the answer & sorry for wasting your time - I didn’t realize this was a known thing that had already been dealt with.

Hello again,

I’ve just done a rerun of one of my analyses with mothur 1.34.2 and I’m still seeing the same thing in the results. Otu00001 has 585529 reads, Otu00002 has 959199, and so on. It’s definitely not a huge issue and I can easily fix it myself by just giving new numbers to my OTUs. It’s just strange to me that they used to be in size-order and they don’t seem to be anymore.

Regards,
Velma

Could you post the commands you are running in mothur?

Hi,

For the whole workflow? Ok, here it is. I’ve run this with both mothur 1.33.3. and 1.34.2, on our lab cluster, with 20 processors both times. The data is V1-V3, MiSeq. We often have more than one sequencing run which we combine, which was also the case here.

-Before mothur, I ran the data through cutadapt for primer removal & quality control.

-For each run separately:

make.contigs(file=file.files, processors=20)

screen.seqs(fasta=file.trim.contigs.fasta, group=file.contigs.groups, maxambig=0, maxlength=550)

unique.seqs(fasta=file.trim.contigs.good.fasta)

align.seqs(reference=silva.nr_v119.align, fasta=file.trim.contigs.good.unique.fasta)

(I just came to realize from talking with a colleague that there’s actually no reason to do this like I did here and start with separate file lists - he’s just handing mothur the whole list of all fastq files from different runs to start with and running it like that. Still, he’s also seeing the thing with the OTU numbering so this shouldn’t be the reason for it.)

-Concatenate .names, .align and .groups files from each run with cat for the rest of the workflow

-With combined files:

count.seqs(name=combined_files.names, group=combined_files.groups)

unique.seqs(fasta=combined_files.align, count=combined_files.count_table)

screen.seqs(fasta=combined_files.unique.align, name=combined_files.unique.count_table, maxhomop=8, start=1046, end=13127)

filter.seqs(fasta=combined_files.unique.good.align, vertical=T, trump=.)

pre.cluster(fasta=combined_files.unique.good.filter.fasta, count=combined_files.unique.good.count_table, diffs=4)

chimera.uchime(fasta=combined_files.unique.good.filter.precluster.fasta, count=combined_files.unique.good.filter.precluster.count_table, dereplicate=T)

remove.seqs(fasta=combined_files.unique.good.filter.precluster.fasta,accnos=combined_files.unique.good.filter.precluster.uchime.accnos)

classify.seqs(fasta=combined_files.unique.good.filter.precluster.pick.fasta, count=combined_files.unique.good.filter.precluster.uchime.pick.count_table, reference=trainset10_082014.pds.fasta, taxonomy=trainset10_082014.pds.tax, cutoff=70)

remove.lineage(fasta=combined_files.unique.good.filter.precluster.pick.fasta, count=combined_files.unique.good.filter.precluster.uchime.pick.count_table, taxonomy=combined_files.unique.good.filter.precluster.pick.pds.wang.taxonomy, taxon=Chloroplast-Mitochondria-unknown-Archaea-Eukaryota)

cluster.split(fasta=combined_files.unique.good.filter.precluster.pick.pick.fasta, count=combined_files.unique.good.filter.precluster.uchime.pick.pick.count_table, taxonomy=combined_files.unique.good.filter.precluster.pick.pds.wang.pick.taxonomy, splitmethod=classify, taxlevel=4, cutoff=0.15)

make.shared(list=combined_files.unique.good.filter.precluster.pick.pick.an.unique_list.list, count=combined_files.unique.good.filter.precluster.uchime.pick.pick.count_table, label=0.03)

classify.otu(list=combined_files.unique.good.filter.precluster.pick.pick.an.unique_list.list, count=combined_files.unique.good.filter.precluster.uchime.pick.pick.count_table, taxonomy=combined_files.unique.good.filter.precluster.pick.pds.wang.pick.taxonomy, label=0.03)

Im having the same problem… Im using mothur v.1.34.4. Here are the first 20 OTUs from the tax file generated with make.shared().

Otu001 1188644
Otu002 1093135
Otu003 429587
Otu004 290725
Otu005 151539
Otu006 80090
Otu007 80528
Otu008 82717
Otu009 57747
Otu010 62040
Otu011 61119
Otu012 65152
Otu013 57627
Otu014 56982
Otu015 31778
Otu016 26735
Otu017 32870
Otu018 30030
Otu019 17450
Otu020 28263

My workflow:
#Phylotype Batch File, Merging Fasta and Qual files first

trim.seqs(fasta=Sam1_144_L001_R1_001.fasta, oligos=fwdbarcodes.txt, qfile=Sam1_144_L001_R1_001.qual, maxambig=0, maxhomop=8, flip=F, bdiffs=1, pdiffs=2, qwindowaverage=35, qwindowsize=50)
unique.seqs(fasta=current)
count.seqs(name=current, group=current)
chop.seqs(fasta=current, count=current, numbases=230, keep=front)
chimera.uchime(fasta=current, count=current, dereplicate=t)
remove.seqs(fasta=current, accnos=current)
classify.seqs(fasta=current, count=current, template=ITSdb.findley.fasta, taxonomy=ITSdb.findley.taxonomy, method=knn, search=blast, match=2, mismatch=-2, gapopen=-2, gapextend=-1, numwanted=1)
remove.lineage(fasta=current, count=current, taxonomy=current, taxon=unknown)
phylotype(taxonomy=current)
make.shared(list=current, count=current)
classify.otu(list=current, count=current, taxonomy=current, label=1)

We sort the OTUs by abundance. The list file contains only unique sequences when a count file is used. This can result in the less abundant OTUs being printed first because they have more unique names, but less overall abundance.

When you use a count file, the list file contains only the unique sequence names. When running the cluster commands, mothur uses the count file information to create the OTUs. The order that the OTUs are printed in the list file is determined by the number of unique sequences.

With the phylotype command, the count data is not used in determining the OTUs. The order is still determined by the number of unique sequences.

When you run the make.shared command, mothur includes the count file data in the abundance counts, but preserves the order the OTUs had in the list file so that the 2 files correlate.

Version 1.35.0 will include the count file abundance information in the printing of the list file for the clustering commands. The *.pick commands will not include the change because they preserve the otuLabel order.

Dear all,

I have the same “problem” as the one mentioned here. I’ve used the cluster.split function and then make.shared+classify.otu with mothur v.1.44.0. I’ve used cluster.split with a count file. At the end, my OTUs are not sorted by size.
Example:
Otu01 3468
Otu02 846
Otu03 2330
Otu04 3777
I think the OTUs in my .list file after cluster.split are numbering according to the number of unique sequences, not the total number of sequences. Do you think it is possible in mothur v.1.44.0?

Best regards

I am not seeing this issue in our current version. Could you upgrade to our current version, Release Version 1.45.3 · mothur/mothur · GitHub?

Hi @westcott ,

Thank you very much for your reply.
I ran the analysis with mothur version 1.45.3 as suggested and encountered the same problem.

mothur v.1.45.3
Last updated: 5/8/21
by Patrick D. Schloss

mothur > cluster.split(fasta=final.fasta, count=final.count_table, taxonomy=final.taxonomy, splitmethod=fasta,taxlevel=5, cluster=FALSE, cutoff=0.03)
mothur > cluster.split(file=final.file)

mothur > classify.otu(list=final.opti_mcc.list,count=final.count_table,taxonomy=final.taxonomy,label=0.03)


OTU     Size    Taxonomy
Otu0001 4662    Bacteria(100);"Bacteroidetes"(100);"Bacteroidia"(100);"Bacteroidales"(100);"Porphyromonadaceae"(100);"Porphyromonadaceae"_unclassified(99);"Porphyromonadaceae"_unclassified(99);
Otu0002 1371    Bacteria(100);"Bacteroidetes"(100);"Bacteroidia"(100);"Bacteroidales"(100);"Porphyromonadaceae"(100);"Porphyromonadaceae"_unclassified(73);"Porphyromonadaceae"_unclassified(73);
Otu0003 6075    Bacteria(100);"Bacteroidetes"(100);"Bacteroidia"(100);"Bacteroidales"(100);"Porphyromonadaceae"(100);"Porphyromonadaceae"_unclassified(100);"Porphyromonadaceae"_unclassified(100);
Otu0004 1738    Bacteria(100);"Bacteroidetes"(100);"Bacteroidia"(100);"Bacteroidales"(100);"Bacteroidales"_unclassified(100);"Bacteroidales"_unclassified(100);"Bacteroidales"_unclassified(100);
Otu0005 2643    Bacteria(100);"Bacteroidetes"(100);"Bacteroidia"(100);"Bacteroidales"(100);"Porphyromonadaceae"(100);"Porphyromonadaceae"_unclassified(100);"Porphyromonadaceae"_unclassified(100);
Otu0006 672     Bacteria(100);"Bacteroidetes"(100);"Bacteroidia"(100);"Bacteroidales"(100);"Porphyromonadaceae"(100);"Porphyromonadaceae"_unclassified(100);"Porphyromonadaceae"_unclassified(100);
Otu0007 651     Bacteria(100);"Bacteroidetes"(100);"Bacteroidia"(100);"Bacteroidales"(100);"Porphyromonadaceae"(100);"Porphyromonadaceae"_unclassified(100);"Porphyromonadaceae"_unclassified(100);
Otu0008 693     Bacteria(100);"Bacteroidetes"(100);"Bacteroidia"(100);"Bacteroidales"(100);"Porphyromonadaceae"(100);"Porphyromonadaceae"_unclassified(97);"Porphyromonadaceae"_unclassified(97);
Otu0009 3944    Bacteria(100);"Bacteroidetes"(100);"Bacteroidia"(100);"Bacteroidales"(100);"Porphyromonadaceae"(100);"Porphyromonadaceae"_unclassified(100);"Porphyromonadaceae"_unclassified(100);
Otu0010 644     Bacteria(100);"Bacteroidetes"(100);"Bacteroidia"(100);"Bacteroidales"(100);"Porphyromonadaceae"(100);"Porphyromonadaceae"_unclassified(100);"Porphyromonadaceae"_unclassified(100);

Thanks for posting the commands and helping us find this bug. The cluster.split command will include the counts in the priority printing of the OTUs when run without the file option. Using the file option will disregard the counts. I have fixed this issue and the change will be part of our next release. In the meantime, can you run these commands instead?

mothur > cluster.split(fasta=final.fasta, count=final.count_table, taxonomy=final.taxonomy, splitmethod=fasta,taxlevel=5, cutoff=0.03)

mothur > classify.otu(list=final.opti_mcc.list,count=final.count_table,taxonomy=final.taxonomy,label=0.03)