I am running v 1.39.5
Using Silva full 132 for alignment
silva 132 and pds16 for taxonomy, cutoff 70
I used vsearch for clustering (yea I know)
I have a mock community containing Salmonella
When I use classify.seqs, I see Salmonella sequences being classified.
When I use classify.otu, I do not see them anymore!
It was working fine before. Changes made: use to use Silva v128.pcr for alignment, classification cutoff of 80, Silva v128 for taxonomy (had a hard time finding Salmonella) and PDS 16 (was finding Salmonella with ease).
Could it be a cluster problem? (My bet is on this, but weird since I never had this problem)
Or should I lower once more the cutoff for classification to 60?
Or anything else?
Ok, So I reran using
cluster(column=current, count=current, method=opti)
And I have Salmonella in my cons.taxonomy but in only 2 samples compared to dozens of them + my mock com when I look at the classification of only my sequences.
Seems like these sequences are getting binned with something else.
So I will go radical and opti clust at 0.01 instead of 0.03 and will lower cutoff to 60 for classify.seqs.
If it does not work, I will go back to my old settings to see if the problem comes from using silva v132 for aligning.
Still super weird.
Are you able to track the Salmonella sequences and see what they’re being classified as under classify.otu?
Ok, I chicken out yesterday and used 0.02 instead of 0.01. Got more positive samples, but still very different from classify.seqs results.
Running at 0.01 this morning.
To answer your question dwaite, I have not checked. But I definitely will. In My mock community, I have a lot of Enterobacteriaceae_unclassified, so I believe this is where my Salmonella are, as my E.coli/Shigella are showing properly. I previously had a bad classification of Salmonella using Silva and cutoff=80, but I was seeing them using PDS and the same cutoff. So I usually did any taxonomic assignment using both databases and report difference in taxonomic assignment when I see them.
But now, it is just weird. And the weirder is that I have another student working on 16S analysis. She did the alignment vs Silva v132 but using V4 region only (pcr.align file) and her mock com is looking quite correct. But in my samples, I know that I have 2 Salmonella serovars (from culture) so it might be what is causing the problem when clustering.
I will see what I get using a cut-off of 0.01. I will then rerun with “SOP like” parameters but using the v4 alignment file.
Darn it, I have an important report due next week, talk about a bad time for the bioinformatics to play tricks on me!
I bet vsearch is your issue. Salmonella is low abundance right? so it’s not being used to seed any clusters, it’s being added into clusters which depending on how they were seeded could be other Enteros which is why the otu’s are Entero unclassified.
kmitchell, you could not have resumed this more accurately.
However, I believe this is a clustering general problem, not only vsearch problem.
I have now opticlust, using cutoff 0.01 and now 106 samples out of 160 now match (both positive or negative when comparing samples for OTU or sequences).
It seems like it is getting better the lower cutoff I use.
It also is probably why it seems particular to this experiment, something else is combining to our Salmonella, something never seen before! (at least in my lab). I will try unique, next, to see if it improves. But still if I had not had my community, I would have been screwed as this experiment involves Salmonella and I am a bacteriologist so names on sequences do matter, especially pathogens. It also raise concerns for past and futur experiments, at least for my lab.
I will report back.
If you have any idea on how to influence clustering so that my Salmonella do not cluster with something else, feel free to post!
Would using cluster.split, at genus level, solve the problem and still be valid?
I reclassified at cutoff 80.
Cluster.split, taxa=6, opti, cutoff 0.03.
got an agreement on 153 out of 160 samples in terms of Salmonella positive samples for classify.seqs and classify.otu. It is getting better.
I am still concern about this though, but this kinda “fixed” the problem.
After I did that, I classified my OTU against Silva and guess what, the mock community is all wrong.
So seems you cannot use cluster.split on RDP and reclassify your obtained OTU on Silva, because there is no match.
I kinda need both classifications, as RDP is better to classify Enterobacteriaceae but Silva a lot better at classifying Lachnos and Ruminos.
Well, that’s not cool.
I need a clustering independent of taxonomy but that can force my Salmonella into their own OTU. I will see what I can come up with.
Hi! Could you tell me if you solved the problema? I have now the same issue…
I have not tried it on the newest version of Mothur.
The only way I have found was to lower OTU clustering to 99% similarity instead of 97%.
I actually choose to move to Dada2 on R because of that bug. My Salmonella were getting binned with another bacteria sequence and since they were not the majority of sequences of this OTU, they were not chosen for the classification and thus disappeared.
I did not try the newest options in Mothur latest version so sorry. I would try to denoise with he alternative options such as Deblur and try to set OTU definition at 99%.
Best of luck and please tell if iy solve the problem with Mothur.
Thanks a lot for your reply. I could not solve it and I also moved to DADA2, which does not show this issue.
Best of success with your analysis!
Envoyé : 7 janvier 2019 13:58