Hello,
Ive just recieved my Hiseq dataset and have been playing around with quality filtering to reduce the amount of data (~700k sequences per sample is just too much!). For the first analysis I just discarded 90% of the data and ran my usual worklfow which works fine.
To potentially improve the output and use the entire data set i wanted to test the effects of different quality filtering criteria.
Ive found that using quality filtering heavily impacts my community composition downstream.
For example, my most abundant phylotype drops from 50% relative abundance to around 10 % with increasing quality stringency criteria.
In order to avoid removing sequences based on quality and having to adress this bias I was wondering if it would be better to remove columns from the alignment based on the quality scores of the sequences. Is there a way to get quality scores associated with the alignment? And if not, is this something that could get implemented?
Cheers
Mark