Hi, I’ve been doubting about asking these questions/issues for a while now.
As I get different results using different algorithms, I was wondering what’s the right - wait!- most correct way to deal with this? Not only do the number of apparent chimeras differ, the overlap is generally not too high (same sequences assigned chimera by different algorithms). So, combining the output increases the amount of sequences to be culled.
Perhaps I’m too picky (can this even be?), but investigating diversity in quasi pristine, remote areas, I’d like to do it as correct as possible. I don’t want to inflate the diversity (or abundances!), but I don’t want to loose (too much) information, either. Removing low abundance sequences is not an option (and indeed, chimeras do not necessarily have low abundances).
Testing Uchime (de novo + Silva), Perseus and Decipher results in the following:
Uchime: 471
Silva: 284
Perseus: 577
Decipher: 666
combining yields unique chimeras (didn’t do all combinations, but you get the idea):
U+S: 661
U+P: 701
U+D: 978
U+P+D: 1151
D+S: 867
S+U+P: 865
S+U+P+D: 1304
out of 6126 unique seqs. Didn’t try Chimera Slayer either. Still have to test the impact of these combinations on the total retained sequences and OTU abundances)
So as chimeras are such a problem, isn’t even more scrutiny needed? Tests on mock/artificial (digital) comunities removing up to 90 % of chimeras is one thing, but isn’t this huge difference in positives worrying?
After going through all the trouble by denoizing, removing low quality seqs, …, the choice of a chimera detection algorithm can still dramatically impact your results, as it appears.
- How many and which algorithms to combine? Where does it end?
- Is manually controlling them to avoid removing too many false positives an option? How to do this?
As I dived into the pyrosequence thing without any previous hands-on experience with Sanger sequences, manual aligning, chimera checking, … I am somewhat handicapped, not having the insights most of you have, limiting my view on things. So if I’m wrong, please correct me.
Any of you having any experiences with DECIPHER? Combining Uchime and Decipher should remove 89 % of all chimeras (according to their datasets, of course).
And while we’re at it, you (Pat) advises still to check the alignment manually. Can you reccomend a program that is able to deal with these amounts of sequences (and MB/GB)? BioEdit, CLustalX, Mega all crash …
Thanks for any input! :geek: