Dealing with sequence errors in alpha diversity calculations

ademenez · September 29, 2016, 1:12pm

Hi all,

I am analyzing a data set in which I included a mock community with 20 species. After clustering (using usearch), removing contaminants etc, I looked at the mock sample and found that it had 19 OTUs with abundances > 400 sequences, 2 OTUs with about 40 reads, and then 88 with 1-18 sequences. Is this common? The majority of these unwanted low abundance reads are abundant in the other samples from this dataset, so I wonder if the problem could be caused by barcode switching. If I can’t rule out that this happened in my “real” samples, then how do I calculate numbers of species per sample, for instance? I understand that for alpha diversity measurements I should include everything, including singletons. Anyone has any advice/recommendations?

Thanks in advance

Alex

pschloss · September 30, 2016, 3:52pm

I would not report absolute diversity metrics, rather I would report them relative to your other samples. As you are findign there are a variety of sources of error (including contamination of your mock community from reagents, etc). By treating evertyhign as “relative” you can safely assume that everything is equally good/bad.

Pat

ademenez · October 12, 2016, 9:33am

Thanks Pat, appreciated!

Alex

Topic		Replies	Views
mock community - spurious OTUs Theory behind mothur	1	2633	February 11, 2015
alpha diversity after remove low abundance Theory behind mothur	10	6468	March 3, 2020
How good is good enough (analysis of mock community)? Commands in mothur	5	4514	July 28, 2015
connection between sequencing depth and alpha-diversity Theory behind mothur	4	8372	March 13, 2014
OTUs number too high Theory behind mothur	7	8608	January 26, 2016

Dealing with sequence errors in alpha diversity calculations

Related topics