Hey dear Mothur community!
I find myself again struggling with my dataset, and really appreciate to be able to share with more users. I know is a very exhausted question but I can’t find a proper way to deal with the clustering steps. So any comment, guide would help and thanks in advance.
I have MiSeq V4 PE 18S, using DB Silva nr_v123. After running most commands same as in the SOP for my QC, this is my summary:
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 921 417 0 4 1
2.5%-tile: 1 923 417 0 5 41073
25%-tile: 1 923 423 0 5 410722
Median: 1 923 424 0 5 821444
75%-tile: 1 923 424 0 6 1232165
97.5%-tile: 1 923 429 0 6 1601814
Maximum: 1 923 430 0 6 1642886
Mean: 1 923 423.812 0 5.35997
of unique seqs: 786875
total # of seqs: 1642886
I tried cluster.split with taxlevel 4 and 5, cutoff=0.02/3. Both got killed (I assume due to the large files) after a few days running.
I am trying to subsample too, maybe to 10K reads.
I wanted to use vsearch but now I read that is not recommended for v4.
Any recommendations?
Is my data to large to be analyzed with mothur?
Thank you so much!
Carla