Lots of unique seqs and too few chimeras?

Dear mothur creators,

I am not sure where to post this, so I will try here.
I would like your opinion on the problem (if it can be called so) that I’ve encountered.

I have ~200 samples, V4, region. Unfortunately, sequenced with 150 bp paired-end. I have always worked with 250 paired-end reads, and although make.contigs with complete overlap and a subsequent qc still resulted in lots of unique sequences for environmental and marine invertebrate microbiota, I usually ended up having reasonably few unique sequences when it comes to human gut microbiota.

This time, even after precluster, I ended up with cca 400000 unique sequences. But, what puzzles me more, is that very few of them (max 2-3%,even after fiddling with parameters to increase the sensitivity) turn out to be chimeras, regardless to the method used. I don’t say it’s wrong, but I do find it strange, as I’ ve always had considerably higher proportion of unique sequences marked as potentially chimeric.

What do you think about this? Is it normal to have so few chimeras?
Thanks in advance for your opinion,
kind regards,

EDIT: So I suppose the problem is simply that because with the short overlap, I am simply left with two many errors. I never remove singletons from my analysis, but do you think it is reasonable to do it this time (before chimera removal)?

Chimera detection can be impacted by sequencing errors, so that is a possibility. Other than that, I’m not really sure what to suggest. What percentage of all of your sequences (counting duplicates) are chimeras?


I am really sorry for a very late reply, I’ve been working on something completely different during the last month.
For vsearch, if I run it without groups, it is about 10%, and that with cutoff =0.1.
I know there is no really good solution to this. It is also not that I want my sequences to be chimeras if they are not :D, but based on my previous experience, the proportion seems low.
Do you think that being strict in this case (I mean with cutoff) can at least a bit alleviate the problem? At least in a sense it will keep false negative rate as low as possible and reduce a bit a dataset? Or it won’t matter anyway, cos I don’t have a good overlap, and my seqs are simply erroneus :frowning:
Thanks for your time and as well for the new mothur version.