Chimera.check

hi,i run the "Chimera.check"command,but i don’t what does the result file mean?
How can i identify the Chimera by the “IS scores”

Sorry for the sparseness of our documentation, but you should be able to see what the column headings mean on the wiki (http://www.mothur.org/wiki/Chimera.slayer). Also, the sequences that are called chimeras, with the minimum level of bootstrap support, are reported in the *accnos file that is generated.

Thanks very much
another questions about filter.
mothur > summary.seqs(fasta=caohai ch.fasta)

Start End NBases Ambigs Polymer
Minimum: 1 392 392 0 4
2.5%-tile: 1 552 552 0 4
25%-tile: 1 1136 1136 0 5
Median: 1 1185 1185 0 5
75%-tile: 1 1228 1228 0 6
97.5%-tile: 1 1275 1275 0 7
Maximum: 1 1544 1544 2 8

of Seqs: 124([the longest one is the seqence of U00096 Escherichia coli 16S rRNA gene,which was a reference sequence defined in the “Mallard” software(a software also used for chimera identification))

Output File Name:
caohai ch.fasta.summary
Output File Name:
caohai ch.fasta.summary


mothur > align.seqs(candidate=caohai ch.fasta,template=core_set_aligned.imputed.fasta)

Reading in the core_set_aligned.imputed.fasta template sequences… DONE.
Aligning sequences from caohai ch.fasta …
100
We’re into D 668 154 :?:
We’re into F 2032 420 :?:
124
Some of you sequences generated alignments that eliminated too many bases, a list is provided in caohai ch.flip.accnos. If you set the flip parameter to true mothur will try aligning the reverse compliment as well. :?:
It took 9 secs to align 124 sequences.
mothur > summary.seqs(fasta=caohai ch.align)

Start End NBases Ambigs Polymer
Minimum: 0 0 0 0 1
2.5%-tile: 109 1973 62 0 3
25%-tile: 109 5716 1099 0 5
Median: 109 5887 1180 0 5
75%-tile: 109 6054 1224 0 6
97.5%-tile: 6499 6849 1263 0 7
Maximum: 6783 6850 1535 0 8

of Seqs: 124

mothur > screen.seqs(fasta=caohai ch.align,start=112,end=4100)
mothur > summary.seqs(fasta=caohai ch.good.align)

Start End NBases Ambigs Polymer
Minimum: 108 4153 810 0 4
2.5%-tile: 109 4442 920 0 4
25%-tile: 109 5809 1151 0 5
Median: 109 5888 1189 0 5
75%-tile: 109 6052 1229 0 6
97.5%-tile: 112 6213 1278 0 7
Maximum: 112 6319 1535 0 8

of Seqs: 110

mothur > filter.seqs(fasta=caohai ch.good.align,trump=.,vertical=T)
Creating Filter…
100
Sequences are not all the same length, please correct. :?: :?:
110


By the way,if i want to remove chimeras,what shoud i do first? Do the "chimera.slayer" after the "align.seqs",or after"filter.seqs"? Are there differences between the chimeras identified by the mothur and RDP online analysis?

in general, i’d say to filter after chimera checking and advise against using the core_set sequence alignment. also, note that mallard is the same as pintail. would you mind emailing us your caohai ch.fasta file to see what’s going on with the weird output?

thanks,
pat

Thanks,i have emailed the caohai ch.fasta to you(pschloss@umich.edu),and the topic is "caihai ch.fasta"Then,could you send me the strategy you used to me ? looking forward to your response
Are there differences between the results analysed by the “mothur” softeware or the “Mallard” software?
Thanks very much!

^ has this issue been solved? please indicate here the steps i would take to solve this issue. thanks

Thanks for your response! The issue wasn’t solved,so ,could you give me some suggestions .
Thanks again

sorry to bother you,i didn’t recieve your response. Do you have any idea about the problem i have met?
Thanks very much

I removed Escherichia coli from your ch.fasta file and saved the file as ch1.fasta and then ran the following commands.

align.seqs(candidate=ch1.fasta, template=silva.bacteria.fasta, flip=t)
chimera.slayer(fasta=ch1.align, template=silva.bacteria.fasta)
remove.seqs(fasta=ch1.align, accnos=ch1.slayer.accnos)
summary.seqs(fasta=ch1.pick.align)
screen.seqs(fasta=ch1.pick.align, start=1044, minlength=1100)
filter.seqs(fasta=ch1.pick.good.align, trump=., vertical=t)

The resulting ch1.pick.good.filter.fasta contains 74 sequences.

I used chimera.slayer to detect chimeras, because we have found it to be the most effective chimera detection method, but alternatively you could use pintail which is the same as mallard. The command would look like: chimera.pintail(fasta=ch1.align, template=silva.bacteria.fasta). I hope this helps!

Thans very much for your kind help. But i still have another question, i have used the “core_set_aligned.imputed.fasta” as the template,is it ok? or are there any differences between “silva” and “core”? actually, i didn’t know which kind of date are more appropriate to these “template”,respectively.

I’m glad you’re getting stuff sorted out - you might consider checking out the Costello example analysis. Also, you might take a look at my PlosONE paper describing the aligner and the recent PLoS Comp. Biol. paper of mine looking at the effect of alignment on distances. I think we’ve determined that the silva-based alignment is the way to go because the alignment preserves the secondary structure much better than the greengenes - core_set database. We use the silva reference alignment for aligning and the silva-based alignment of the gold database (available on the wiki) for running chimera.slayer.

sorry to bother again. I run the "chimera.slayer"in mothur and the Mallad for chimera checking with the the same sequences. But the results are significantly different, not only the number of chimera,but also the chimera themselves . why? ? :?: :?:which confused me much these days. also, some of my classmates told me that if i shorten the length of sequences, the number of chimeras will decrease. :?: :?: :?: i used the "chop command " to cut my sequences (the raw lengths are about 1400BP) to 800bp,1000bp,1200bp, respectively. However, the sequences of 800 bp got the most chimeras, on the contrary, the sequences of 1400b bp got the least, :?: :?: and the number of chimera and chimeras themselves were different for each length :?: :?:

sorry to bother again. I run the "chimera.slayer"in mothur and the Mallad for chimera checking with the the same sequences. But the results are significantly different, not only the number of chimera,but also the chimera themselves . why? ?

Well, they use very different algorithms. Although not published yet (but it is submitted), chimera.slayer shows better ability to detect chimeras than pintail/mallard with a very low false positive rate.

which confused me much these days. also, some of my classmates told me that if i shorten the length of sequences, the number of chimeras will decrease. i used the "chop command " to cut my sequences (the raw lengths are about 1400BP) to 800bp,1000bp,1200bp, respectively. However, the sequences of 800 bp got the most chimeras, on the contrary, the sequences of 1400b bp got the least, and the number of chimera and chimeras themselves were different for each length

Such analyses are very difficult to interpret unless you do the same thing with synthetic chimeras where you know the break point. Your classmate is correct - as you decrease the sequence length you find fewer chimeras. Not because they have disappeared, but because they become harder to find. This too is backed up by the experiments performed by the original developers of chimera.slayer.

But if i decrease the sequence length, for example ,to 500 bp, will such length be too short ?? does the sequence length have some effect on the following caculation? such as cluster? otu based analysis?
Cause there is little paper about the length of a sequence, so i am afraid if i shorten the length, some information will be miss, and the reslut will not be accurate.
There is a new method called “pyrosequencing”, i found sequence gotten by this method is short ,only about 300. However , sequences in the “traditional” clone library i use now are comparely longer. why? what’s difference between these two methords?

Yeah, this is a big issue. I have a paper that I just published in PLoS Computational Biology, Rob Knight has a couple of papers about read length, and the RDP people addressed this in their Wang paper describing the Bayesian classifier. Length matters. Because the 16S rRNA gene does not evolve evenly along its length, analyzing short and long sequences will affect how you interpret the results. That being said, the pipeline doesn’t really change if your sequences are 60 bp or 1500 bp.

As far as differences between Sanger and pyrosequencing there are many and you need to select a method to suit your question. If you are doing phylogeny or probe design, you need longer Sanger reads. If you need a large number of reads, pyrosequencing is the way. Pyrosequencing is also much cheaper. At a typical University sequencing center one could sequence clones for about $6/read. To sequence ~800,000 pyrosequences it will cost about $8000. I can spread those sequences across ~100 samples. So each sequence will run me ~$0.01 and each sample will cost about $800 and you will get ~10,000 reads. For the same amount of money you will only get about 130 Sanger reads. Most sequencing centers will allow you to multiplex your sample with others so you don’t have to pay for the full plate. Again, it depends on your question, but I can’t foresee why anyone with access to pyrosequencing would do Sanger if they are interested in what mothur has to offer.

Pat