We are pleased to be releasing mothur v.1.19.0; however, we wish it were under better circumsntances. Recently it came to our attention that the testing approach we were using to measure our ability to detect true chimeras (specificity) while limiting the risk of falsely calling sequences chimeric (specificity) was flawed. It turns out that while we had significantly increased the sensitivity compared to the original version of ChimeraSlayer (http://microbiomeutil.sourceforge.net/), we also increased the false positive rate. For example, using the test set available with the original ChimeraSlayer on test sequences that were 200 bp, the sensitivity increased from 63% to 70% and the specificity decreased from 99.4% to 87.4%. With longer sequences the difference in sensitivity and specificity increases. This approach to measuring specificity is conservative; however, it is much more reliable than what we had been doing. In summary, while chimera.slayer did a better job of identifying true chimeras, it also mistakenly called many good sequences chimeric. The practical upshot is that it if you used the default settings in chimera.slayer it is likely that extra sequences were removed. Since learning of this problem, we have been trying to modify our version of chimera.slayer to be more faithful to the original ChimeraSlayer. At this point, the main differences between ours and the original is the ability to use any reference alignment, significantly improved speed, and integration within mothur. We are removing the distance and kmer search methods for the time being and will do as the original does and use megablast to do the parent searching.
This is a pretty critical update and insist that everyone update their version of mothur. Please spread the word to colleagues that have not received this notice. We know you trust us to give you a robust and reliable software package and hope you can accept our apology for any problems this may cause in your analysis. Feel free to contact me if you have any question, concerns, or flames.
Is this new way of running chimera.slayer a substantial increase in computing effort? I have tried running the program a few times since the release of v.1.19 and it takes many times longer to run. For example, previously I could run chimera.slayer in a few hours on my dataset… the other night I ran the program for 8 hours and it only made it thru 1600 sequences (out of 20,000+).
I am just trying to find out if it is something I am doing incorrectly or if this is an appropriate increase in time given the new approach.
Yeah, unfortunately it does seem to take a bit longer than before. But 8 hrs to do 1600 sequences is a bit much - is this a mac, pc or linux? I was just able to do 1000 on 2 processors in 15 min. We think the problem is that because it is running blast external to mothur many times, it is getting slowed down. We are currently trying to incorporate NCBI’s tools so that blast is built into mothur and will run much faster. We’ll get this updated asap…
I thought it might be related to the megablast program running separately.
These are titanium sequences being run on my macbook (using 2 processors)… I was running everything at home so perhaps my internet is a bit slow there and was partially responsible for slowing things down even more. I will just run it and let it go till it is finished. I was mainly hesitant to run it for a day or two and then have it give me an error at the end.
I had noticed the problems previously with the chimera after it took out >60% of my sequences when running the chimera slayer.
Therefore, I install the new version (1.19.0) and when I try and re-run the chimera slayer it gave me an error (I have included it at the end of this message. The only way I can stop the computer continued spinning is to go into the System monitor and kill the processors.
Alternatively, my colleague Joyce tried compiling the new version of Mothur from the source code but unfortunately she got the same error code.
Additionally I couldn’t find the a megablast executable on the ncbi website and installing the Blast 2.2.25+ standalone program also didn’t relive the problem.
We are both running mothur on linux.
Is there a link where we can find the megablast, I couldn’t find any using the search function in on the mothur wiki. If there is one could some one please post it.
Thank you very much for your time addressing my issue etc. I really depend upon the mothur program for my work.
Emily
mothur > chimera.slayer(fasta=beard.trim.good.align, template=silva.gold.align, processors=6)
Checking sequences from beard.trim.good.align …
Reading sequences from silva.gold.align…Done.
sh: ./blast/bin/formatdb: not found
Only reporting sequence supported by 90% of bootstrapped results.
sh: ./blast/bin/megablast: not found
sh: ./blast/bin/megablast: not found
sh: ./blast/bin/megablast: not found
sh: ./blast/bin/megablast: not found
sh: ./blast/bin/megablast: not found
sh: ./blast/bin/megablast: not found
[ERROR]: Could not open beard.trim.good.slayer.chimera2570.num.temp
[ERROR]: Could not open beard.trim.good.slayer.chimera2571.num.temp
[ERROR]: Could not open beard.trim.good.slayer.chimera2573.num.temp
[ERROR]: Could not open beard.trim.good.slayer.chimera2574.num.temp
[ERROR]: Could not open beard.trim.good.slayer.chimera2578.num.temp
[ERROR]: Could not open beard.trim.good.slayer.chimera2580.num.temp
If your are using the new Costello tutorial, this could be your problem. From the text it says “We want to run chimera.slayer on stool.trim.unique.good.filter.fasta”, but Pat does actually not run the Filter.seqs command before running the Chimera.slayer command. As a consequence the file you are using is the align file, with all the gaps, which makes the file very large.
So try to run the Filter.seqs command before, hopefully this will help.