filter

dong5600 · September 11, 2010, 6:09am

Dear all:
I am working on some 454 pyrosequencing analysis following the steps used in Costello Stool analysis in sample analysis.
I have about 20 k reads. The summary of the results after “align” and the following “screen” both show that my reads are all >300 bp. But after the following filter to remove gaps and “.”, I found the beginning sequences were truncated and the size becomes 245-288 bp. The command I used was “filter.seqs(fasta=sample.trim.unique.good.align, vertical=T, trump=., processors=2)”. I have used the same method to analyze part of the sequences from the same samples (about 10 k reads in total) and there was no change in size before and after filter. Not sure what happens. I am worried that this truncation of sequences will result in significant difference in final results. please advise the solution. Millions of thanks!
The following is an example of the same sequence before and after filter, the underlined and italic letters are the part trimmed after filter:

before
ATTGAACGCTGGCGGCATGCCTTACACATGCAAGTCGAACGGCAGCACGGGAGCTTGCTCCTGGTGGCGAGTGGCGAACGGGTGAGTAATATATCGGAACGTACCCAGTGGTGGGGGATAGCCCGGCGAAAGCCGGATTAATACCGCATACGATCTACGGATGAAAGCGGGGGATCGCAAGACCCCGCGCTATTGGAGCGGCCGATATCTGATTAGCTAGTTGGTAGGGTAAAAGCCTACCAAGGCTACGATCAGTAGCTGGTCTGAGAGGACGACCAGCCACACTGGAACTGAGACACGGTCCAGACTCCTACGGAAGGCA[b]GCAGACGCAC[/b]

after CTTGCTCCTGGTGGCGAGTGGCGAACGGGTGAGTAATATATCGGAACGTACCCAGTGGTGGGGGATAGCCCGGCGAAAGCCGGATTAATACCGCATACGATCTACGGATGAAAGCGGGGGATCGCAAGACCCCGCGCTATTGGAGCGGCCGATATCTGATTAGCTAGTTGGTAGGGTAAAAGCCTACCAAGGCTACGATCAGTAGCTGGTCTGAGAGGACGACCAGCCACACTGGAACTGAGACACGGTCCAGACTCCTACGGAAGGCA

pschloss · September 13, 2010, 4:54pm

That sounds about right. Remember that the trump=. will trim the sequences to the shortest sequence in the alignment space. So you should have a sequence that is 300 bp. Then because sequences have insertions/deletions the sequences will be shorter and longer than that. If you want to be sure that all of the sequences are >300 bp after trump=., then you might specify 350 or so in screen.seqs.

Hope this helps,
Pat

Topic		Replies	Views
filter.seqs Commands in mothur	8	5814	August 3, 2011
explanation of trump option in filter.seqs Theory behind mothur	4	6346	May 23, 2014
filter.seqs Commands in mothur	4	4010	August 2, 2012
Sequences are not the same length Commands in mothur	3	4198	July 8, 2011
align.seqs Commands in mothur	6	5389	August 28, 2010

filter

Related topics