Clustering OTUs

Hey dear Mothur community!

I find myself again struggling with my dataset, and really appreciate to be able to share with more users. I know is a very exhausted question but I can’t find a proper way to deal with the clustering steps. So any comment, guide would help and thanks in advance.
I have MiSeq V4 PE 18S, using DB Silva nr_v123. After running most commands same as in the SOP for my QC, this is my summary:
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 921 417 0 4 1
2.5%-tile: 1 923 417 0 5 41073
25%-tile: 1 923 423 0 5 410722
Median: 1 923 424 0 5 821444
75%-tile: 1 923 424 0 6 1232165
97.5%-tile: 1 923 429 0 6 1601814
Maximum: 1 923 430 0 6 1642886
Mean: 1 923 423.812 0 5.35997

of unique seqs: 786875

total # of seqs: 1642886

I tried cluster.split with taxlevel 4 and 5, cutoff=0.02/3. Both got killed (I assume due to the large files) after a few days running.
I am trying to subsample too, maybe to 10K reads.
I wanted to use vsearch but now I read that is not recommended for v4.

Any recommendations?
Is my data to large to be analyzed with mothur?

Thank you so much!


your amplicon is >400 bases long? you likely have a decent amount of sequencing error. what did you use for preclustering? I’d try diffs=6 and see if that drops your “uniques” down low enough. Also what groups are your sequences being ID’d to. If a lot are ID’d to the same group, decreasing cluster.split tax level won’t help-I run into this for groups that are “unclassified” below class/order

Thanks for the answer!!
I will recheck my QC and rerun maybe using the new version that was released with the clustering improvements.


I too tried cluster.split with those options and the program quit. Has there been a solution for this?

Are you using version 1.39.3? In version 1.39 we changed the default clustering method to opti. The opti method produces better quality OTU assignments and uses significantly less time and memory.

I’m going to try it today.