I have followed the method set out in the MiSeq SOP to calculate error rates and number of spurious OTUs. I sequenced the same HMP community that you mention and I am using the V3 region of the 16S rRNA gene (160bp).
My error rate is low (0.0003%), but I have an incredibly large number of OTUs (~ 400). 26 of these OTUs are highly abundant, and those remaining consist of a handful of doubleton OTUs and hundreds of singleton OTUs. In my mock community file, I have around 88000 sequences and 840 unique sequences.
Does the above information scream sequencing errors? Any further comments on this would be valuable.
Congrats on using a mock community - that’s great! So there are a few things to think about.
the error rate does not include chimeras. Even after chimera removal, we know that some persist that we can’t cull. Some of your extra OTUs may in fact be undetected chimeras.
88000 is a lot of sequences and so you would expect even at very low error rates to have some extra OTUs.
Some of those extra OTUs may be low-level contamination. These would probably show up as being a decent number of mismatches away from the references.
And perhaps most importantly, it’s really hard to talk about alpha diversity or membership-based statistics in an absolute manner. Rather, it’s really best to think about them in terms of relative comparisons between samples.