I’m currently testing ChimeraSlayer on a functional gene dataset with a self-compiled reference database. However, ChimeraSlayer is flagging some of the reference sequences as chimeras. How is this possible? (it might seem redundant, but I need to include the reference sequences in the dataset to be analysed)
The sequences in both datasets have exactly the same length and are aligned exactly in the same way, since the gene in question is quite conserved (same number of columns, gap positions, etc). I’m running the application with default values and the trim and split options.
This does not happen with UCHIME, despite the great overlap between the chimeras detected.
Well, one possibility is that there was recombination in your gene of interest. If you look at the original ChimeraSlayer paper you’ll see that they also detected a low level of recombination in 16S rRNA gene sequences. You might try using the de novo-based approach instead.
Recombination is certainly a possibility, although I can not verify it at the current stage.
However, my question rather concerned the fundamental functioning of ChimeraSlayer. In this case, it flags sequences that are present in both the dataset analysed and reference database, in one run. This seems somehow redundant, since one would assume that sequences in the template database are intrinsically non-chimeric, and thus a 100% match between query and reference sequences can not possibly be flagged.
Am I just misunderstanding how ChimeraSlayer works, or could this somehow be an issue with the reference database compilation itself, or something else?
I already double-checked for possible mistakes in the alignments/sequence headers due to sequence manipulations.
This is part of a meta-analysis of genes in public databases, and unfortunately I can not apply de novo-based methods without going through all the hundreds of individual datasets.