Start-end position screen.seqs

Hi all,

I have a problem with find start position and end position for screen.seqs after alignment.
to mention that the analysis below is from a 197 samples of different water bodies using PE 300 illumina for v3v4 region (341F,806R). however i only used the R1 read since R2 had a bad quality and lots of N’s, beside i only kept the first 197 bp because the quality after this specific position drooped and contained a lot of N’s.



mothur > align.seqs(fasta=NP_197samples.trim.trim.unique.fasta, reference=/store_data/ashraf/outdir/Data/Intensities/BaseCalls/R1/transformed/mothur/silva.bacteria/silva.bacteria.fasta,flip=T)

mothur > summary.seqs()

Start End NBases Ambigs Polymer NumSeqs
Minimum: -1 -1 0 0 1 1
2.5%-tile: 6332 13859 197 0 4 163831
25%-tile: 6333 13862 197 0 4 1638302
Median: 6333 14956 197 0 4 3276603
75%-tile: 6333 14961 197 0 5 4914904
97.5%-tile: 6333 14963 197 0 7 6389375
Maximum: 43116 43116 197 6 124 6553205
Mean: 6486.25 14559.3 195.521 0.00086782 4.63685

of Seqs: 6553205

mothur > screen.seqs(fasta=NP_197samples.trim.trim.unique.align, count=NP_197samples.trim.trim.count_table, summary=NP_197samples.trim.trim.unique.summary, start=6332, end=14963,processors=12)
mothur > summary.seqs(fasta=current,count=current)

Start End NBases Ambigs Polymer NumSeqs
Minimum: 6326 14963 197 0 3 1
2.5%-tile: 6332 14963 197 0 4 896
25%-tile: 6332 14963 197 0 4 8955
Median: 6332 14963 197 0 4 17910
75%-tile: 6332 14963 197 0 5 26864
97.5%-tile: 6332 14965 197 0 6 34923
Maximum: 6332 15649 197 3 18 35818
Mean: 6331.99 14963.2 197 0.000558378 4.52328

of unique seqs: 20020

total # of seqs: 35818

what couse this sequence number reduction from over 6 million reads to 35 K reads? what is that i did wrong?

Try running this instead…

screen.seqs(fasta=NP_197samples.trim.trim.unique.align, count=NP_197samples.trim.trim.count_table, summary=NP_197samples.trim.trim.unique.summary, start=6333, end=14963,processors=12)

start is the position that you want your sequences to start at or before and end is the position where sequences should start at or after.

Also, in case you missed it (and I see you are experiencing this), you might want to show the powers that be this…

http://blog.mothur.org/2014/09/11/Why-such-a-large-distance-matrix%3F/