Clustering OTUs

cgimpel · January 19, 2017, 12:39am

Hey dear Mothur community!

I find myself again struggling with my dataset, and really appreciate to be able to share with more users. I know is a very exhausted question but I can’t find a proper way to deal with the clustering steps. So any comment, guide would help and thanks in advance.
I have MiSeq V4 PE 18S, using DB Silva nr_v123. After running most commands same as in the SOP for my QC, this is my summary:
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 921 417 0 4 1
2.5%-tile: 1 923 417 0 5 41073
25%-tile: 1 923 423 0 5 410722
Median: 1 923 424 0 5 821444
75%-tile: 1 923 424 0 6 1232165
97.5%-tile: 1 923 429 0 6 1601814
Maximum: 1 923 430 0 6 1642886
Mean: 1 923 423.812 0 5.35997

of unique seqs: 786875

total # of seqs: 1642886

I tried cluster.split with taxlevel 4 and 5, cutoff=0.02/3. Both got killed (I assume due to the large files) after a few days running.
I am trying to subsample too, maybe to 10K reads.
I wanted to use vsearch but now I read that is not recommended for v4.

Any recommendations?
Is my data to large to be analyzed with mothur?

Thank you so much!

Carla

Kendra · January 24, 2017, 6:04pm

your amplicon is >400 bases long? you likely have a decent amount of sequencing error. what did you use for preclustering? I’d try diffs=6 and see if that drops your “uniques” down low enough. Also what groups are your sequences being ID’d to. If a lot are ID’d to the same group, decreasing cluster.split tax level won’t help-I run into this for groups that are “unclassified” below class/order

cgimpel · January 25, 2017, 8:50pm

Thanks for the answer!!
I will recheck my QC and rerun maybe using the new version that was released with the clustering improvements.

Carla

heyheyhey · February 28, 2017, 3:10pm

I too tried cluster.split with those options and the program quit. Has there been a solution for this?

westcott · February 28, 2017, 7:49pm

Are you using version 1.39.3? In version 1.39 we changed the default clustering method to opti. The opti method produces better quality OTU assignments and uses significantly less time and memory.

heyheyhey · March 1, 2017, 6:19pm

I’m going to try it today.

Topic		Replies	Views
Clustering a large dataset Commands in mothur	6	1128	February 8, 2019
Using cluster.split with large data Commands in mothur	2	2699	March 31, 2014
make.shared only dishes out unique otus Commands in mothur	3	2596	September 25, 2014
Trouble shoting of cluster Commands in mothur	21	12387	January 11, 2013
Problems with Cluster.split	6	396	August 2, 2022

Clustering OTUs

of unique seqs: 786875

Related topics