I am analyzing a data set in which I included a mock community with 20 species. After clustering (using usearch), removing contaminants etc, I looked at the mock sample and found that it had 19 OTUs with abundances > 400 sequences, 2 OTUs with about 40 reads, and then 88 with 1-18 sequences. Is this common? The majority of these unwanted low abundance reads are abundant in the other samples from this dataset, so I wonder if the problem could be caused by barcode switching. If I can’t rule out that this happened in my “real” samples, then how do I calculate numbers of species per sample, for instance? I understand that for alpha diversity measurements I should include everything, including singletons. Anyone has any advice/recommendations?
Thanks in advance