Removing chimeric sequences

Hi there,

The updates with chimera slayer has got me thinking about what is the best way to deal with chimeric sequences- is it valid just to remove them completely from the analysis?

I can understand the removal of chimeric sequences if they were due to random effects and are present at low levels (eg 1% of the total sequences). Removal of chimeras will also give a better idea of the actual diversity.

However, it seems like the number of chimeric sequences can get quite high in amplicon pyrosequencing data (>10% of total sequences or even higher in some cases). In addition, according to the Chimera Slayer paper (Haas et al 2011), chimeras are more likely between 16S rRNA sequences from highly abundant organisms. Chimeras between certain pairs are also more likely, and are reproducible between independent amplifications. This suggests to me that chimera formation is not just a random process (noise), and removing chimeras may actually artificially lower the counts of the more abundant organisms.

It will be very interesting to have your thoughts on this, or let me know if I am completely on the wrong track with my thinking.

I also noticed that there is a trim option in mothur for its chimera.slayer function, to give chimeric sequences that have been trimmed to leave its longest piece (albeit set to false by default). Is it a valid approach to include these trim sequences in subsequent analysis? Any ideas on the best approach to deal with chimeric sequences if they are present at a significant level in your data (>10%)?


Hi Kelvin,

All good questions. Using mock communities and synthetic chimeras, the false positive rates are quite low. And are more likely to artificially increase the number of OTUs/phylotypes. The next iteration of the Costello analysis will move the chimera checking to after the pre-clustering step. There we would suggest supplying the name file as well as setting reference=self. This will allow you to check for chimeras by stepping from most to least abundant using the more abundant sequences as potential parents to the chimera.