I just ran a chimera check on 96 libraries, and obtained very high percentages of chimeras (ranging between 20 % to 50%).
I used the following command:
chimera.uchime(fasta=sequences.precluster.fasta, count=sequences.precluster.count_table, dereplicate=t)
These sequences were aligned using the silva.bacteria reference files, screened using screen.seqs, filtered using filter.seqs, and pre.clustered prior to running the chimera check.
I searched through the mothur forum and found one post from 2009 regarding high chimeras found, however the thread ended without a resolution of the problem being posted.
Is there something I might be doing wrong? I would not expect to have so many chimeras in my data
Anecdotal, but I’ve found this to be the case in all of my studies as well. Here’s a reference that sequenced a mock community and found similar chimera rates (32% and 36%): http://dx.doi.org/10.1007/s12275-012-2642-z
Thank you very much for the reference, its helpful at least in helping me decide on the wether or not too keep these run.
Since I didn’t prepare these samples, I’m curious now to know how many PCR cycles were used in the protocol. I will definitely be asking our provider.
I notice this post is two years old already, but was there any resolution to the high % chimera problem? I’m running into the same problem (some of my communities even have up to 60% chimeras). I’m following the mothur SOP, my code looks essentially the same. I’ve used 35 cycles to amplify my amplicons (working with low-biomass sediment samples) so this may partially explain my numbers. Any recommendations?
One thing that has been seen in the literature is that lengthening the extension times reduces the rate of chimera formation. This is why we use a 5 minute extension time. It doesn’t get rid of all of the chimeras, but we generally see <15% of our total sequences being chimeras. The data for this observation are tucked away in the Supplement of this: http://www.ncbi.nlm.nih.gov/pubmed/21212162
Thanks for the reference, Pat.
I’ve noticed that many of these sequences that were identified as chimeras produce great alignments when I blast them, so now I’m going to see how many of these are actually false positives, and whether they bias the rest of the community.
Remember that chimeras are most likely to form between closely related parents so it would not be surprising for them to align well.
This topic is very interesting… I have the same issue with my last two runs. I always feel very bad to throw 30 to 45% of my sequences to trash.
According to you Pat, is it a big scientific issue to perform the downstream analysis with this important removal ?
Figuring out what is really an in silico chimera is difficult. Seems like the current consensus is to be very strict with chimera detection for surveys. I think the current algorithms may be too strict, but I also don’t think it’s worth fighting to keep those sequences because there are better wetlab ways to figure out if things are true chimeras vs false positives. If fine differences between close relatives is your research interest, better to attack it with data other than v4.