Hi, I’ve noticed that there are several chimera identification method options in mothur.
Like chimera.slayer and chimera.uchime, and both of them can either use database files as reference or just use themselves as reference.
I’m confused which method is the best. Probably none of them are perfect, but which one shows more advantages in practical using? (Not from the calculating time, I mean from the output result)
For my specific data, chimera.slayer generates ~9k chimera sequences, while chimera.uchime generates ~16k chimera sequences. Not sure which one to use…
And another question about the command itself, when I use chimera.uchime(fasta=ABCD.fasta,reference=self,name=ABCD.names) I can not use processors option, it mentions only one processor will be used. However, if I use chimera.uchime(fasta=ABCD.fasta,reference=ABCD.fasta,name=ABCD.names), I can add like processors=10 and the program runs without error message. But I wonder whether the result will be the same or not? I've not tried yet, but if anyone knows the answer or the explanation for that, please let me know. Really appreciate!
Great questions. In our hands (and others), chimera.uchime is has better sensitivity than chimera.slayer. With reference=self, we sort the sequences by abundance, which is why you need the name data. If you use reference=self it will only use 1 processor because the command needs to know whether the more abundant sequences are chimeric or not and this cannot be parallelized since the process must be done in series. It makes sense for reference=self to be better than reference=silva.gold.align because you are using your actual data to determine what sequences are chimeric. The gold database is limited in that it is based on cultured sequences, which would limit your ability to detect chimeras from the Archaea or TM7 or some other odd group of organisms or genes.
chimera.uchime(fasta=ABCD.fasta,reference=self,name=ABCD.names) I can not use processors option, it mentions only one processor will be used
That is correct.
if I use chimera.uchime(fasta=ABCD.fasta,reference=ABCD.fasta,name=ABCD.names), I can add like processors=10 and the program runs without error message. But I wonder whether the result will be the same or not?
Hmm… This is not how we intended for it to be used and I suspect mothur is thinking you are running in db mode and going along with processors=10 and ignoring the names data. We are putting in some error checking to make sure that you can’t run it this second way. Thanks for kicking the tires for us!
If you use your own data as the reference for this method, does it make sense to run this before alignment, or to remove gaps introduced during alignment to database before running?
I would do it after you have aligned your sequences, trimmed them to length (w/ filter.seqs), run unique.seqs and precluster so you “know” the actual abundances of each sequence type. uchime will automatically remove the gaps from the sequences since it does it’s own aligning.