Hi all,
I have a problem with find start position and end position for screen.seqs after alignment.
to mention that the analysis below is from a 197 samples of different water bodies using PE 300 illumina for v3v4 region (341F,806R). however i only used the R1 read since R2 had a bad quality and lots of N’s, beside i only kept the first 197 bp because the quality after this specific position drooped and contained a lot of N’s.
mothur > align.seqs(fasta=NP_197samples.trim.trim.unique.fasta, reference=/store_data/ashraf/outdir/Data/Intensities/BaseCalls/R1/transformed/mothur/silva.bacteria/silva.bacteria.fasta,flip=T)
mothur > summary.seqs()
Start End NBases Ambigs Polymer NumSeqs
Minimum: -1 -1 0 0 1 1
2.5%-tile: 6332 13859 197 0 4 163831
25%-tile: 6333 13862 197 0 4 1638302
Median: 6333 14956 197 0 4 3276603
75%-tile: 6333 14961 197 0 5 4914904
97.5%-tile: 6333 14963 197 0 7 6389375
Maximum: 43116 43116 197 6 124 6553205
Mean: 6486.25 14559.3 195.521 0.00086782 4.63685
of Seqs: 6553205
mothur > screen.seqs(fasta=NP_197samples.trim.trim.unique.align, count=NP_197samples.trim.trim.count_table, summary=NP_197samples.trim.trim.unique.summary, start=6332, end=14963,processors=12)
mothur > summary.seqs(fasta=current,count=current)
Start End NBases Ambigs Polymer NumSeqs
Minimum: 6326 14963 197 0 3 1
2.5%-tile: 6332 14963 197 0 4 896
25%-tile: 6332 14963 197 0 4 8955
Median: 6332 14963 197 0 4 17910
75%-tile: 6332 14963 197 0 5 26864
97.5%-tile: 6332 14965 197 0 6 34923
Maximum: 6332 15649 197 3 18 35818
Mean: 6331.99 14963.2 197 0.000558378 4.52328
of unique seqs: 20020
total # of seqs: 35818
what couse this sequence number reduction from over 6 million reads to 35 K reads? what is that i did wrong?