chimera.slayer error

Hi Pat,

I have been trying to use chimera.slayer to remove potential chimeras from my data, however I get the error “Error, when I trim your sequences, the entire sequence is trimmed.” I have used the trim.seqs command to trim the sequences in the file to between 300 and 450bp. I am quite a novice at this, and so am not entirely sure what is going wrong. (I have checked the forums and couldn’t find a similar problem, but I apologise if this has already been covered).

Also I know this is not really the best section for this but is there a way of resampling fasta files to standardise the sequences across a series of files (as daisychopper does).

thanks,
Tris

I have been trying to use chimera.slayer to remove potential chimeras from my data, however I get the error “Error, when I trim your sequences, the entire sequence is trimmed.” I have used the trim.seqs command to trim the sequences in the file to between 300 and 450bp. I am quite a novice at this, and so am not entirely sure what is going wrong. (I have checked the forums and couldn’t find a similar problem, but I apologise if this has already been covered).

It would be helpful if you could post the result of the summary.seqs command using the fasta file that you are trimming. This is a common issue, so don’t worry - you aren’t alone in this.

Also I know this is not really the best section for this but is there a way of resampling fasta files to standardise the sequences across a series of files (as daisychopper does).

Not yet. I’m not totally convinced that it’s necessary if people pick “good” beta-diversitiy metrics that are functions of relative abundance. In other words, don’t use Bray-Curtis, rather use Morisita-Horn or ThetaYC. But it’s on the list of features to add…

Hi Pat,

Thanks for the quick response.

This is the terminal output from my efforts with mothur so far.


mothur v.1.12.0 Last updated: 7/23/2010

by
Patrick D. Schloss

Department of Microbiology & Immunology
University of Michigan
http://www.mothur.org

When using, please cite:
Schloss, P.D., et al., Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol, 2009. 75(23):7537-41.

Distributed under the GNU General Public License

Type ‘help()’ for information on the commands that are available

Type ‘quit()’ to exit program



mothur > trim.seqs(fasta=one1050R.fas, minlength=300, maxlength=450)
Output File Names: one1050R.trim.fasta one1050R.scrap.fasta one1050R.trim.qual one1050R.scrap.qual
mothur > unique.seqs(fasta=one1050R.trim.fasta)
Output File Names: one1050R.trim.unique.fasta one1050R.trim.names
mothur > pre.cluster(fasta=one1050R.trim.unique.fasta, name=one1050R.trim.names)

0 1881 15
100 1841 55
200 1838 58
300 1832 64
400 1822 74
500 1805 91
600 1804 92
700 1803 93
800 1802 94
900 1798 98
1000 1791 105
1100 1791 105
1200 1785 111
1300 1779 117
1400 1743 153
1500 1724 172
1600 1722 174
1700 1719 177
1800 1717 179
Total number of sequences before precluster was 1896.
pre.cluster removed 180 sequences.

Output File Names:
one1050R.trim.unique.precluster.fasta
one1050R.trim.unique.precluster.names


mothur > align.seqs(candidate=one1050R.trim.unique.precluster.fasta, template=silva.bacteria.fasta)
Reading in the silva.bacteria.fasta template sequences... DONE. Aligning sequences from one1050R.trim.unique.precluster.fasta ... 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1716 Some of you sequences generated alignments that eliminated too many bases, a list is provided in one1050R.trim.unique.precluster.flip.accnos. If you set the flip parameter to true mothur will try aligning the reverse compliment as well. It took 31 secs to align 1716 sequences.
Output File Names: one1050R.trim.unique.precluster.align one1050R.trim.unique.precluster.align.report one1050R.trim.unique.precluster.flip.accnos
mothur > chimera.slayer(fasta=one1050R.trim.unique.precluster.align, template=silva.bacteria.fasta)

Checking sequences from one1050R.trim.unique.precluster.align …
Reading sequences from silva.bacteria.fasta…Done.
Reading sequences from one1050R.trim.unique.precluster.align…Done.

Only reporting sequence supported by 90% of bootstrapped results.
Error, when I trim your sequences, the entire sequence is trimmed.
petros:one1050R Tristrom$




I don't know if this helps but I'm using a macbook with 2.2Ghz core 2 duo and 4GB ram.

Also with regards to the resampling, I have several samples I want to compare, one has say 6000 reads, one 4000 and one 5500, I would like to randomly resample from the two higher ones extracting 4000 reads so that all the samples are comparable. Is that possible in mothur? Someone else did it for me in daisychopper (perl) but I don’t know how to use that, I would love to be able to do everything in mothur.

thanks for your help,
Tris

sorry forgot to add this in.


mothur > summary.seqs(fasta=one1050R.fas)
Start End NBases Ambigs Polymer Minimum: 1 200 200 0 3 2.5%-tile: 1 226 226 0 4 25%-tile: 1 348 348 0 5 Median: 1 409 409 0 5 75%-tile: 1 453 453 0 6 97.5%-tile: 1 535 535 0 6 Maximum: 1 561 561 0 7 # of Seqs: 3342

Output File Name:
one1050R.fas.summary

So you are running things a bit out of order. You need to run…
mothur > trim.seqs(fasta=one1050R.fas, minlength=300, maxlength=450)
mothur > unique.seqs(fasta=one1050R.trim.fasta)
mothur > align.seqs(candidate=one1050R.trim.unique.fasta, template=silva.bacteria.fasta)
mothur > screen.seqs(fasta=one1050R.trim.unique.fasta, start=???, end=???, name=one1050R.trim.names, group=one1050R.groups)
mothur > filter.seqs(fasta=one1050R.trim.unique.good.fasta)
mothur > unique.seqs(fasta=one1050R.trim.unique.good.filter.fasta, name=one1050R.trim.good.names)
mothur > pre.cluster(fasta=one1050R.trim.unique.good.filter.unique.fasta, name=one1050R.trim.unique.good.filter.names, diffs=??)
mothur > chimera.slayer(fasta=one1050R.trim.unique.good.filter.fasta, template=silva.gold.fasta)

You might consult the example analysis we have posted for the best way to process sequences at http://www.mothur.org/wiki/Costello_stool_analysis. The problem is probably that your sequences don’t overlap fully with each other, which will be a problem for any analysis your run. If you want, please send me the result of the following command…

mothur > summary.seqs(fasta=one1050R.trim.unique.precluster.align).

Hope this helps,
Pat

Hi Pat,

Thanks for the help with that, it worked after the screen.seqs process, also I had sequenced with the reverse primer and not done reverse seqs before!

Cheers,
Tris

Hi Pat,

I have a question regarding the chimera slayer output. After getting it to work, I proceeded to analyse my data using two different macs, to speed things up. I noticed after installing the latest release of mothur for mac (1.12.3) many more chimeras were being found in the same data than with v1.12.0 which I was originally using. Is this normal, for example, in a file with 2155 sequences, v1.12.0 found 55 chimeras, and v1.12.3 found 767. That seems a little strange to me. The data was processed in the manner you described in the previous post. I am not sure why the large discrepancy is occurring.

thanks,
Tris

Ah, it looks like a feature for 1.13 was added to one of the patches to 1.12. We changed the default from minsnp=10 to minsnp=100. As you noticed, the number of detected chimeras goes up. Using the testing platform described by the original developers, minsnp=100 does a better job of detecting chimeras without having additional false positives. Also, I just noticed that I had a typo regarding the database that should be used. We have found that the silva.bacteria.fasta database will increase the number of false positives. Instead, please use the silva-aligned gold database. The gold database was created by the original developers. The only difference is that theirs is aligned using the greengenes reference and ours is to the silva reference.

Hopefully, it goes without saying that when it comes to putative chimeras, you should do some due diligence and double check that chimera calls make sense. Although not published yet, the developers report (and I can confirm) that about 20% or fewer of the reads in a 454 run are chimeric under normal PCR conditions. So while 767 of 2155 looks like a lot (36%), keep in mind that if you are using a name file, then most of the 767 sequences are probably singletons and doubletons and the other 1400 sequences are higher frequency. This should drop the percentage below 20%.

Hi Pat,

Thanks for the quick reply. That now makes sense, I was panicking for a while at the huge jump!

thanks again,
Tris