Hi, I’m new to using mothur and, actually, analyzing sequence data. As I’ve read a million times in this forum it is clearly a mistake to work with the V3-V4 region; unfortunately, it was too late to learn this when I got this data. So, I analyzed it by following the manual and forum, but, I have a problem with these ridiculously different start positions after alignment.
This is the situation after make.contigs:
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 250 250 0 3 1
2.5%-tile: 1 274 274 0 4 143919
25%-tile: 1 298 298 0 4 1439188
Median: 1 442 442 0 5 2878375
75%-tile: 1 465 465 0 6 4317562
97.5%-tile: 1 466 466 2 31 5612831
Maximum: 1 500 500 75 250 5756749
Mean: 1 401.079 401.079 0.221197 7.30846
of Seqs: 5756749
After that, I run screen.seqs :
screen.seqs(fasta=stability.trim.contigs.fasta,minlength=10,maxlength=470,maxambig=0,maxhomop=8)
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 250 250 0 3 1
2.5%-tile: 1 274 274 0 4 125262
25%-tile: 1 298 298 0 4 1252620
Median: 1 441 441 0 5 2505240
75%-tile: 1 465 465 0 6 3757860
97.5%-tile: 1 466 466 0 6 4885218
Maximum: 1 470 470 0 8 5010479
Mean: 1 398.46 398.46 0 4.82017
of Seqs: 5010479
Then, I run unique.seqs, create a count table, and run align.seqs with the tailored reference fasta that I produced from “silva.bacteria.fasta” by trimming it at the start position = 3388 and the end position= 25316.
And, I got this:
Start End NBases Ambigs Polymer NumSeqs
Minimum: 0 0 0 0 1 1
2.5%-tile: 1 18929 7 0 2 125262
25%-tile: 1 18929 298 0 4 1252620
Median: 1 18929 441 0 5 2505240
75%-tile: 3979 18929 465 0 6 3757860
97.5%-tile: 8578 18929 466 0 6 4885218
Maximum: 18929 18929 470 0 8 5010479
Mean: 1874.55 18718 387.523 0 4.70551
of unique seqs: 2887389
total # of seqs: 5010479
Well, as I understand it is not how it should be, and I do not know how I can fix this. What should I do?
Thanks for all your time and help in advance.
E.T.
Update:
I tried to set “minlength” parameter of screen.seqs to 390 bp and run it again. After, I followed the manual as I did before. The result I’ve got confused my mind even more. This is the summary of align.seqs:
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 14 3 0 1 1
2.5%-tile: 1 18928 439 0 4 80931
25%-tile: 1 18928 441 0 4 809302
Median: 1 18928 459 0 5 1618603
75%-tile: 1 18928 464 0 6 2427904
97.5%-tile: 1 18928 465 0 6 3156275
Maximum: 18910 18928 469 0 8 3237205
Mean: 26.8008 18798.8 450.293 0 4.92839
of unique seqs: 2048792
total # of seqs: 3237205
As I search for it but I could not find any explanation. I used the same reference fasta as I used before. What does it mean?
Thanks for your time and help again.
E.T.