Chimera identification in seq.error

Hello,

I’ve been using seq.error to look at error rates in some Illumina mock community data. The documentation for seq.error is a little unclear, however, about how exactly it handles chimeras. I was hoping to get some clarification on two points:

First, how is chimera identification done in seq.error? I looked at the values in the .error.chimera file, and it looks like there must be at least 3 fewer mismatches to the best chimera than to the best single reference. Is this indeed the criteria used by mothur?

Second, I want to fine-tune parameters for chimera.uchime (or possibly another algorithm) in order to apply it to real data from the same run (my sequences are relatively short, so chimera.uchime’s default parameters don’t work so well). Is it a good idea to use the chimeras called by seq.error to evaluate parameter choices (i.e. calculate % of chimeras removed and false positive rates for different thresholds)? Or is there a better, more robust way to train chimera.uchime for my data?

Thanks,
Joe

Sorry for the scant documentation. Need to work on that…

First, how is chimera identification done in seq.error? I looked at the values in the .error.chimera file, and it looks like there must be at least 3 fewer mismatches to the best chimera than to the best single reference. Is this indeed the criteria used by mothur?

We describe the chimera calling method for seq.error here: Reducing the Effects of PCR Amplification and Sequencing Artifacts on 16S rRNA-Based Studies

Second, I want to fine-tune parameters for chimera.uchime (or possibly another algorithm) in order to apply it to real data from the same run (my sequences are relatively short, so chimera.uchime’s default parameters don’t work so well). Is it a good idea to use the chimeras called by seq.error to evaluate parameter choices (i.e. calculate % of chimeras removed and false positive rates for different thresholds)? Or is there a better, more robust way to train chimera.uchime for my data?

seq.error is probably the best way. Alternatively, you could use Robert Edgar’s software to develop your own training set where you know the chimeras and where they are formed - this is how he created his simm datasets that he used in testing Uchime. I believe it is described in the Uchime paper.

Pat