18S Sequences Too Short After Alignment with SILVA nr_v138_2 – Mothur v1.48.0

Hello

We’re trying to process 18S sequences (V1-V2 regions of the 18S rRNA gene, Primer Euk82F and Euk516F) with Mothur, version 1.48.0.

After alignment on the Silva.nr_v138_2.align database, we get too short sequences with 95% title containing 450 bp (the maxlenght given in screen.seq) before alignment and 25% tile containing 10 bp.

mothur > summary.seqs(fasta=18s.trim.contigs.fasta, count=18s.contigs.count_table)

Using 40 processors.

		Start	End	NBases	Ambigs	Polymer	NumSeqs
Minimum:	1	282	282	0	3	1
2.5%-tile:	1	399	399	1	4	86005
25%-tile:	1	445	445	6	4	860046
Median: 	1	446	446	8	5	1720091
75%-tile:	1	450	450	11	6	2580136
97.5%-tile:	1	454	454	20	7	3354177
Maximum:	1	567	567	78	264	3440181
Mean:	1	444	444	8	5
# of unique seqs:	3440181
total # of seqs:	3440181

It took 112 secs to summarize 3440181 sequences.

Output File Names:
18s.trim.contigs.summary


mothur > screen.seqs(fasta=18s.trim.contigs.fasta, count=18s.contigs.count_table,summary=18s.trim.contigs.summary, maxambig=0, maxlength=450, maxhomop=8)


mothur > summary.seqs(count=current)
Using 18s.contigs.good.count_table as input file for the count parameter.
Using 18s.trim.contigs.good.fasta as input file for the fasta parameter.

Using 40 processors.

		Start	End	NBases	Ambigs	Polymer	NumSeqs
Minimum:	1	284	284	0	3	1
2.5%-tile:	1	293	293	0	3	930
25%-tile:	1	307	307	0	3	9297
Median: 	1	309	309	0	4	18594
75%-tile:	1	312	312	0	4	27891
97.5%-tile:	1	422	422	0	7	36258
Maximum:	1	450	450	0	8	37187
Mean:	1	314	314	0	3
# of unique seqs:	37187
total # of seqs:	37187

It took 1 secs to summarize 37187 sequences.


mothur > align.seqs(fasta=18s.trim.contigs.good.unique.fasta, reference=silva.nr_v132.align)


mothur > summary.seqs(fasta=18s.trim.contigs.good.unique.align, count=18s.trim.contigs.good.count_table)

Using 40 processors.

		Start	End	NBases	Ambigs	Polymer	NumSeqs
Minimum:	0	0	0	0	1	1
2.5%-tile:	1046	1056	4	0	1	930
25%-tile:	1046	1083	8	0	2	9297
Median: 	43053	43116	11	0	2	18594
75%-tile:	43096	43116	16	0	3	27891
97.5%-tile:	43112	43116	422	0	6	36258
Maximum:	43116	43116	450	0	8	37187
Mean:	26606	27490	45	0	2
# of unique seqs:	29090
total # of seqs:	37187

It took 6 secs to summarize 37187 sequences.

In conclusion, we obtain sequences that are too short after alignment for the 18s

Hi there,

Could you possibly try using flip=T in align.seqs? Also, do you know how these sequences were generated?

Pat

Hello
I have redone the steps by including the flip=T in align.seq , and I get the same result.


mothur > screen.seqs(fasta=18s.trim.contigs.fasta, count=18s.contigs.count_table,summary=18s.trim.contigs.summary, maxambig=0, maxlength=550, maxhomop=8)


mothur > summary.seqs(count=current)
Using 18s.contigs.good.count_table as input file for the count parameter.
Using 18s.trim.contigs.good.fasta as input file for the fasta parameter.

Using 32 processors.

		Start	End	NBases	Ambigs	Polymer	NumSeqs
Minimum:	1	284	284	0	3	1
2.5%-tile:	1	293	293	0	3	934
25%-tile:	1	307	307	0	3	9334
Median: 	1	309	309	0	4	18668
75%-tile:	1	312	312	0	4	28001
97.5%-tile:	1	422	422	0	7	36401
Maximum:	1	545	545	0	8	37334
Mean:	1	315	315	0	3
# of unique seqs:	37334
total # of seqs:	37334

It took 1 secs to summarize 37334 sequences.

Output File Names:
18s.trim.contigs.good.summary



mothur > align.seqs(fasta=18s.trim.contigs.good.unique.fasta, reference= silva.nr_v138_2.align, flip=T)


mothur > summary.seqs(fasta=18s.trim.contigs.good.unique.align, count=18s.trim.contigs.good.count_table)

Using 32 processors.

		Start	End	NBases	Ambigs	Polymer	NumSeqs
Minimum:	0	0	0	0	1	1
2.5%-tile:	1044	1058	4	0	1	934
25%-tile:	4728	11885	9	0	2	9334
Median: 	43056	43116	11	0	2	18668
75%-tile:	43096	43116	18	0	3	28001
97.5%-tile:	43112	43116	422	0	6	36401
Maximum:	43116	43116	498	0	8	37334
Mean:	30148	31051	46	0	2
# of unique seqs:	29224
total # of seqs:	37334

It took 6 secs to summarize 37334 sequences.

Output File Names:
18s.trim.contigs.good.unique.summary

The sequences were generated on an Illumina Miseq platform with paired-end reads.

Thanks

Can you post one of the sequences that is getting truncated during alignment?

Pat

Hello

In “18s.trim.contigs.good.unique.align_report”

M07192_54_000000000-LLDG7_1_2118_15178_15837 307 AY437674.FmbDicho 1714 kmer 7.00 needleman 1 11 1794 1804 11 0 0 0 72.72

In “18s.trim.contigs.good.unique.fasta”

M07192_54_000000000-LLDG7_1_2118_15178_15837 ee=2.90838
GGAGCAAGGAACCTGGGAATGGCTCGTAATGAAATTAGGCAAACTAAAGTGTCCTACCAGTTTGGCGTGCATTTGCGTACTCCCAGTTACAAGCAAACGCACATACAGACACTTGTTCGATGGTTTCCACGAGACCTGGCTTGCCTAATGGATGTTGACGGTTTGTGTGTGTGTGTGTGAGAGAGTGAGTGAATTGAAGTAAGTACGTCAATGGGCTTCACACCGCACAGCACCGAGAGAGAAGGAGTGTGTTCGGAGCTTGCGGTTCCGCTGCGTCTACCTCGGAGGGCAAGTCTGGTCTGTCTCT

Thanks

In “18s.trim.contigs.good.unique.align”

M07192_54_000000000-LLDG7_1_2118_15178_15837

I had to delete the beginnings and ends of empty alignments due to a lack of characters to enter when replying.

…GG–A-GC-A-A----GG-AA…

Thanks - I wonder if there’s a problem with your sequencing. I tried blasting your sequence against nt (blastn) and nr (blastx) and there were no significant hits.

GGAGCAAGGAACCTGGGAATGGCTCGTAATGAAATTAGGCAAACTAAAGTGTCCTACCAGTTT
GGCGTGCATTTGCGTACTCCCAGTTACAAGCAAACGCACATACAGACACTTGTTCGATGGTTT
CCACGAGACCTGGCTTGCCTAATGGATGTTGACGGTTTGTGTGTGTGTGTGTGAGAGAGTGAG
TGAATTGAAGTAAGTACGTCAATGGGCTTCACACCGCACAGCACCGAGAGAGAAGGAGTGTGT
TCGGAGCTTGCGGTTCCGCTGCGTCTACCTCGGAGGGCAAGTCTGGTCTGTCTCT

You can find the blast report here: NCBI Blast:Nucleotide Sequence

What kind of samples are you sequencing? Do you know if the run quality was good?

Pat

1 Like