Pipeline optimization: total reads vs unique reads

staj · February 12, 2020, 9:53pm

hi

FIrst thanks for this wonderful software and for keeping it so relevant.

I am working on pipeline optimization and have a mock with 8 bacteria and 2 fungi.

I get about 1200 OTUs and our error rate is 0.08%. I see this is mainly due to the large number of reads going in (around 100-200k). Most reads are coming from those first 8 or so OTUs.

I saw in your 2013 paper you used 5000 total reads per sample but am worried this may not be enough for complex samples.

Might I use 5000 unique reads per sample instead?

What do you think?

pschloss · February 13, 2020, 1:25pm

Hi,

I think people make too much about the number of reads going into a sample. Regardless of the number of reads you use, you need to have the same number of total reads (not unique reads) in each sample when you rarefy the data.

Pat

staj · February 19, 2020, 9:45pm

hi

I tried making a rarefaction plot manually with my 120K reads and 1200OTUs by downsampling the reads and thus reducing OTUs proportionately.

This did not work because apparently rarefaction plots are made by rarefying unique reads.

pschloss · February 20, 2020, 3:03pm

They’re made using the total number of sequences, not the number of unique sequences.

pat

Topic		Replies	Views
Low biomass sequencing depth	2	551	July 18, 2021
How can I calulate richness and rarefaction curve correctly Commands in mothur	4	5928	July 9, 2013
Objection of reviewer about number of OTUs	11	1635	June 13, 2019
How many reads are needed? Theory behind mothur	1	1112	June 23, 2017
Read numbers Theory behind mothur	3	1780	August 15, 2016

Pipeline optimization: total reads vs unique reads

Related topics