Can someone tell me, whether it is necessary to perform chimera check before aligning the sequences, or is it ok to do it after. 454 SOP performs chimera check after preclustering the data. I have been talking to many bioinformaticians regarding this issue, and still do not understand it. Is there a definite answer for this?
My recommendation, and this isn’t based on any formal benchmarks, is that you align first. The reason differs depending on your chimera screening protocol, but the end result is the same either way.
Reference-based chimera checking - If you’re doing reference-based checking then aligning first allows you to align/trim/filter the non-chimeric database to the same region as your amplicons so you have a direct comparison between target and query sequences.
De novo chimera checking - Although the global alignment is not used for the uchime approach, aligning your sequences first allows you to identify and discard sequences that cannot be aligned to the template (and hence probably don’t represent ‘real’ 16S genes). This means that when you do the chimera checking you don’t have spurious amplicons being factored into the calculations.
As I said though, I don’t have any data that backs this up, it just seems to me that it’s best to optimize your data prior to chimera screening, as chimera testing is probably one of the more…imprecise quality control steps in a 16S pipeline.