I am working on a 16S amplicon study where we have amongst others sequenced a mock community to test our method.
I’m using mothur to analyze the data and one of the functions I am using is seq.error(). To assess the error rates and degree of chimeras of the read data generated by PCRing the mock community, I compare the aligned reads (generated with align.seqs, with SILVA as the reference) against an alignment consisting solely of 16S sequences coming from taxa that are present in the mock (basically a subset of the SILVA alignment), with seq.error().
If I understand its chimera detection method properly, it constructs a “database” of all possible chimeras, and checks whether a read is more than 3 bp identical to a possible chimera than to a real sequence. Is this still the case?
Now, I have observed some strange behaviour. By accident, I had included a couple of incomplete 16S sequences in my mock reference alignment. As a result, >90% of the reported chimeras (numparents = 2 in the .error.chimera file) had reported the shortest incomplete 16S sequence as one of the parents of the chimera. In addition, the reported breakpointChi, was nearly always within 10 bp of the coordinate where that incomplete 16S sequence ended, with the majority of reported breakpointChi’s on exactly the coordinate of where that sequence ended.
Then, when I remove the shortest incomplete 16S sequence from the reference, the next shortest incomplete 16S was now nearly always reported as being one of the chimera parents, and the breakpointChi was reported around the region where thát sequence ended.
Finally, when I remove all incomplete 16S sequences from the alignment and/or replace them with complete 16S sequences, the number of reported chimeras drops significantly, but still, a significant fraction of the reported chimeras reports the shortest complete 16S sequence of the reference as one of the parents.
Thus, to me this sounds like that seq.error() reports many reads as chimera’s which may not necessarily be actual chimera’s.
Is this something you are aware of? And how should we go around this problem?