align.seqs and no of bases

Hello there,

I am having some issues after I run align.seqs.

mothur > summary.seqs(fasta=stability.trim.contigs.good.unique.fasta, count=stability.trim.contigs.good.count_table, processors=2)


Using 2 processors.

                Start   End     NBases  Ambigs  Polymer NumSeqs
Minimum:        1       5       5       0       1       1
2.5%-tile:      1       135     135     0       4       11367
25%-tile:       1       160     160     0       5       113670
Median:         1       229     229     0       11      227340
75%-tile:       1       229     229     0       13      341009
97.5%-tile:     1       229     229     1       20      443312
Maximum:        1       229     229     48      142     454678
Mean:   1       201.683 201.683 0.0498419       10.4269
# of unique seqs:       333146
total # of seqs:        454678

Output File Names:
stability.trim.contigs.good.unique.summary

This is what things look like before I run align.seqs. Then…

align.seqs(fasta=stability.trim.contigs.good.unique.fasta, reference=../bin/silva.bacteria/silva.bacteria.fasta)
summary.seqs(fasta=stability.trim.contigs.good.unique.align, count=stability.trim.contigs.good.count_table, processors=2)

Using 2 processors.

                Start   End     NBases  Ambigs  Polymer NumSeqs
Minimum:        0       0       0       0       1       1
2.5%-tile:      1044    1044    1       0       1       2782
25%-tile:       1044    1058    5       0       2       27817
Median:         6430    13125   8       0       2       55634
75%-tile:       43102   43116   11      0       2       83451
97.5%-tile:     43115   43116   160     0       10      108486
Maximum:        43116   43116   229     28      23      111267
Mean:   22362.1 23407.5 29.3862 0.0170041       2.5103
# of unique seqs:       99373
total # of seqs:        111267

Output File Names:
stability.trim.contigs.good.unique.summary

I do not understand why after running summary.seqs the second time around, the number of bases in most of my sequences appears to be very low. Also, I appear to lose a large proportion of my sequences here. I have tried trimming the alignment using pcr.seqs, but to the same end.

Could anybody shed any light on this? Thanks in advance.

Jo

Have you tried running align.seqs with flip=t? If the flip parameter is set to true the reverse complement of the sequence is aligned and the better alignment is reported.

Hello,

In response to your advice, I re-ran align.seqs using flip=t. Unfortunately, a similar picture after running summary.seqs is evident:

mothur > summary.seqs(fasta=stability.trim.contigs.good.unique.align, count=stability.trim.contigs.good.count_table, processors=2)


Using 2 processors.

                Start   End     NBases  Ambigs  Polymer NumSeqs
Minimum:        0       0       0       0       1       1
2.5%-tile:      2       10      3       0       1       3231
25%-tile:       2       6699    8       0       2       32302
Median:         20727   21228   13      0       2       64604
75%-tile:       20746   21228   28      0       3       96905
97.5%-tile:     21223   21228   160     0       10      125976
Maximum:        21228   21228   229     27      23      129206
Mean:   13816.1 15055.6 32.5059 0.0220268       2.69123
# of unique seqs:       117298
total # of seqs:        129206

Any further advice would be appreciated.

Jo

Can you post the align.seqs command you ran to generate the align file? It looks like you used the output of pcr.seqs to generate a region-specific reference. Can you try again with silva.bacteria.fasta?

Hello, and sorry for my delay in replying.

I have tried align.seqs with both the output of pcr.seqs, as well as the complete alignment silva.bacteria.fasta. The output of summary.seqs remains the same.

align.seqs(fasta=stability.trim.contigs.good.unique.fasta, reference=…/bin/silva.bacteria/silva.bacteria.fasta, flip=t, processors=2)

summary.seqs(fasta=stability.trim.contigs.good.unique.align, count=stability.trim.contigs.good.count_table, processors=2)

Using 2 processors.

                Start   End     NBases  Ambigs  Polymer NumSeqs
Minimum:        0       0       0       0       1       1
2.5%-tile:      1044    1056    4       0       1       11367
25%-tile:       1044    1067    11      0       2       113670
Median:         6428    13125   13      0       2       227340
75%-tile:       43024   43116   140     0       4       341009
97.5%-tile:     43107   43116   161     0       10      443312
Maximum:        43116   43116   229     46      74      454678
Mean:   15699.9 17915.9 58.8197 0.0351831       3.08064
# of unique seqs:       333146
total # of seqs:        454678

Output File Names:
stability.trim.contigs.good.unique.summary

Thanks,

Jo

Can you post stability.trim.contigs.good.unique.fasta somewhere for us to pulldown and take a look at? Then email us at mothur.bugs@gmail and include a link to this thread. We’ll take a look and get back to you here.

Pat