unaligned after align.seqs

l.franca · February 8, 2011, 3:36pm

hello

IÂ´m sure this a recurrent problem but i`m missing something that i cannot figure out.

I used just a forward primer in 454 amplification to V6 region.

After:

trim.seqs(fasta=F2.fna, oligos=barcode.oligos, qfile=F2.qual, maxambig=0, maxhomop=8, bdiffs=1, pdiffs=2,qwindowaverage=35, qwindowsize=50, minlength=250, maxlength=500)

I get:

mothur > summary.seqs(fasta=F2.fasta)

Start End NBases Ambigs Polymer
Minimum: 1 250 250 0 3
2.5%-tile: 1 278 278 0 4
25%-tile: 1 361 361 0 5
Median: 1 392 392 0 5
75%-tile: 1 423 423 0 5
97.5%-tile: 1 491 491 0 6
Maximum: 1 500 500 0 8

of Seqs: 14628

After command [b]uniq.seqs[/b]:

mothur > summary.seqs(fasta=F2.unique.fasta)

Start End NBases Ambigs Polymer
Minimum: 1 250 250 0 3
2.5%-tile: 1 278 278 0 4
25%-tile: 1 360 360 0 5
Median: 1 392 392 0 5
75%-tile: 1 425 425 0 5
97.5%-tile: 1 491 491 0 6
Maximum: 1 500 500 0 8

of Seqs: 13593

After command align.seqs(candidate=F2.unique.fasta, template=silva.bacteria.fasta, flip=t)
(none sequence was flipped)

mothur > summary.seqs(fasta=F2.unique.align)

Start End NBases Ambigs Polymer
Minimum: 1044 5271 221 0 3
2.5%-tile: 1044 6104 277 0 4
25%-tile: 1044 8186 360 0 5
Median: 1044 8507 392 0 5
75%-tile: 1044 9964 425 0 5
97.5%-tile:1044 13862 491 0 6
Maximum: 1467 15641 500 0 8

of Seqs: 13593

This summary shows there is a problem in the alignment. It seems there is a perfect alignment in the left column (i can cut all the sequences that do not start on 1044) but, on the other hand, there are serious problems on the right side. If i reject all the sequences that do not represent the median values i will lost a serious amount of data.

How can i improve this?

Thanks for your help.

ajone · February 8, 2011, 4:55pm

As your sequences still have various length(ranging from 250 to 500) this is perfectly fine. Off topic. How Manu sequences did have before trimming?

l.franca · February 8, 2011, 5:44pm

I had about 16K sequences. I lost almost 2000 sequences in the trimming process. Do you think i could have defined a minimum lenght too high? The average lenght is 390.

i followed the common procedure and after filter.seqs(trump=., vertical=T) i had:

Start End NBases Ambigs Polymer
Minimum: 1 529 117 0 3
2.5%-tile: 50 544 123 0 3
25%-tile: 50 544 142 0 4
Median: 50 544 146 0 5
75%-tile: 50 544 155 0 5
97.5%-tile: 53 544 165 0 6
Maximum: 108 544 210 0 8

of Seqs: 13593

I think this is right, isnÂ´t it?

ajone · February 8, 2011, 6:45pm

Only 2000 sequences. It is very few sequences to lose after such a stringent trimming. Are these sequences you work with in some way preprocessed? Another thing I wondering about is that after the filter.seqs command your median length (146bp) is almost 100 bp. below your min. length in the trimming process. also there seems to be problems as some of your sequences do not start at position 1 after filtering.

l.franca · February 8, 2011, 7:18pm

No, i dont think so. The raw files were given to me by the sequencing facility without any pretreatment.

And you are right, after align.seqs command, the sequences should start at position 1.

However, i was able to “correct” this alignment problem, following a simple procedure described in sipos et al. “Robust computational analysis of rRNA hypervariable tag datasets”. This article state that a consistent alignment could be achieved merging an alignment produced by NAST+Silva (mothur) with an alignment made by infernal algorith (RDP). They developed a tool that can merge the best alignments from each tool by combining the hypervariable regions aligned using the NAST algorithm with the regions of strong secondary structure aligned by infernal.

I did this and after filter.seqs i have:

Start End NBases Ambigs Polymer
Minimum: 1 505 213 0 3
2.5%-tile: 1 505 219 0 4
25%-tile: 1 505 247 0 4
Median: 1 505 255 0 5
75%-tile: 1 505 260 0 5
97.5%-tile: 1 505 274 0 6
Maximum: 1 505 281 0 8

of Seqs: 12705

I think this is right now. What do you think about this strategy?

thanks

Topic		Replies	Views
1 sequence remains after align.seqs Commands in mothur	7	10104	June 17, 2011
align.seqs and no of bases Commands in mothur	5	2940	January 16, 2015
strange behaviour align.seqs with samples from same run Commands in mothur	5	4954	November 20, 2014
Short sequences after align.seqs Commands in mothur	3	117	April 6, 2024
seqs length in align.seqs command Commands in mothur	2	2579	June 18, 2014

unaligned after align.seqs

of Seqs: 14628

of Seqs: 13593

of Seqs: 13593

of Seqs: 13593

of Seqs: 12705

Related topics