quality scores with the alignment

mdavids · June 3, 2016, 8:43am

Hello,

Ive just recieved my Hiseq dataset and have been playing around with quality filtering to reduce the amount of data (~700k sequences per sample is just too much!). For the first analysis I just discarded 90% of the data and ran my usual worklfow which works fine.

To potentially improve the output and use the entire data set i wanted to test the effects of different quality filtering criteria.
Ive found that using quality filtering heavily impacts my community composition downstream.
For example, my most abundant phylotype drops from 50% relative abundance to around 10 % with increasing quality stringency criteria.

In order to avoid removing sequences based on quality and having to adress this bias I was wondering if it would be better to remove columns from the alignment based on the quality scores of the sequences. Is there a way to get quality scores associated with the alignment? And if not, is this something that could get implemented?

Cheers

Mark

pschloss · June 7, 2016, 1:30pm

I think using HiSeq data is going to be very perilous (http://blog.mothur.org/2014/09/11/Why-such-a-large-distance-matrix/). If you want to go that route, you should really have Mock community data so you know what your error rates are as you make decisions throughout your pipeline.

Pat

mdavids · June 8, 2016, 11:38am

Hello Pat,

Thanks for the reply. The Hiseq was something the sequencing company chose to run. As Im strapped for time, rerunning with Miseq is not really an option.

My concern is that some amplicons will return with ambigous or poor quality basecalls more easily than others. And thus removing sequences based on quality biases the observed composition. Instead I would rather align all the data and remove poor quality columns to reduced to total number of sequences that I need to cluster. What would be your concerns with this approach?

Mark

pschloss · June 9, 2016, 4:51pm

I think you’re trading biases. You’re probably going to be stuck with phylotyping the data and going from there.

Topic		Replies	Views
Threshold command to cull non-target sequences Feature requests	0	3465	September 27, 2010
Combining HiSeq 2000 and 2500 Theory behind mothur	6	3260	May 17, 2016
screen.seqs request Feature requests	0	4120	December 5, 2011
filter.seqs(trump=.) but with a "soft limit" Feature requests	0	3460	June 15, 2011
How to get quality scores of the reads in Miseq pipeline	1	613	May 16, 2019

quality scores with the alignment

Related topics