Hi
I’m working with 61 samples, Illumina, that correspond to 9340562 seqs.
I was able to do all procedures described on http://www.mothur.org/wiki/MiSeq_SOP until I reached the screen.seqs step after alignment.
I already rerun the command several times and I have different results, but in all the number of sequences that remain after screen.seqs command is really really low, despite the summary results looks auspicious. Additionally, even with the start value of 10368, after screening I have sequences starting before, isn’t it odd?
here is a copy of the log file where the summary results are shown and then the results of the screen.seqs.
Thank you for any help
mothur > get.current()
Current files saved by mothur:
fasta=minas_duasamostragens.trim.contigs.good.unique.align
group=minas_duasamostragens.contigs.good.groups
name=minas_duasamostragens.trim.contigs.good.names
count=minas_duasamostragens.trim.contigs.good.count_table
processors=9
summary=minas_duasamostragens.trim.contigs.good.unique.summary
Current working directory: /diag/home/itiago/Minas/
mothur > summary.seqs(fasta=minas_duasamostragens.trim.contigs.good.unique.align, count=minas_duasamostragens.trim.contigs.good.count_table)
Using 9 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: -1 -1 0 0 1 1
2.5%-tile: 10368 28464 411 0 4 220952
25%-tile: 10369 28464 417 0 4 2209519
Median: 10369 28464 418 0 5 4419037
75%-tile: 10369 28464 419 0 6 6628555
97.5%-tile: 10370 28464 420 1 6 8617121
Maximum: 43116 43116 420 2 9 8838072
Mean: 10383 28458.3 417.348 0.0852903 4.96302
of unique seqs: 7614218
total # of seqs: 8838072
Output File Names:
minas_duasamostragens.trim.contigs.good.unique.summary
It took 4353 secs to summarize 8838072 sequences.
mothur > screen.seqs(fasta=minas_duasamostragens.trim.contigs.good.unique.align, count=minas_duasamostragens.trim.contigs.good.count_table, summary=minas_duasamostragens.trim.contigs.good.unique.summary, start=10368, end=28464, processors=20)
Using 20 processors.
Output File Names:
minas_duasamostragens.trim.contigs.good.unique.good.summary
minas_duasamostragens.trim.contigs.good.unique.good.align
minas_duasamostragens.trim.contigs.good.unique.bad.accnos
minas_duasamostragens.trim.contigs.good.good.count_table
It took 5620 secs to screen 7614218 sequences.
mothur > get.current()
Current files saved by mothur:
fasta=minas_duasamostragens.trim.contigs.good.unique.good.align
group=minas_duasamostragens.contigs.good.groups
name=minas_duasamostragens.trim.contigs.good.names
count=minas_duasamostragens.trim.contigs.good.good.count_table
processors=20
summary=minas_duasamostragens.trim.contigs.good.unique.summary
Current working directory: /diag/home/itiago/Minas/
mothur > summary.seqs(fasta=current, count=current)
Using minas_duasamostragens.trim.contigs.good.good.count_table as input file for the count parameter.
Using minas_duasamostragens.trim.contigs.good.unique.good.align as input file for the fasta parameter.
Using 20 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 10245 28464 398 0 3 1
2.5%-tile: 10366 28464 412 0 4 43724
25%-tile: 10368 28464 417 0 4 437237
Median: 10368 28464 418 0 5 874473
75%-tile: 10368 28464 419 0 6 1311709
97.5%-tile: 10368 28464 420 1 7 1705221
Maximum: 10368 29665 420 2 9 1748944
Mean: 10367.8 28464 417.45 0.08753 4.97613
of unique seqs: 1502223
total # of seqs: 1748944
Output File Names:
minas_duasamostragens.trim.contigs.good.unique.good.summary
It took 1064 secs to summarize 1748944 sequences.
#Here I noticed the minimum value that did not matched with the value on command, and the dramatic lost of sequences on the process. So I rerun the command…
mothur > screen.seqs(fasta=minas_duasamostragens.trim.contigs.good.unique.align, count=minas_duasamostragens.trim.contigs.good.count_table, summary=minas_duasamostragens.trim.contigs.good.unique.summary, start=10367, end=28464)
Using 20 processors.
Output File Names:
minas_duasamostragens.trim.contigs.good.unique.good.summary
minas_duasamostragens.trim.contigs.good.unique.good.align
minas_duasamostragens.trim.contigs.good.unique.bad.accnos
minas_duasamostragens.trim.contigs.good.good.count_table
It took 4428 secs to screen 7614218 sequences.
mothur > get.current()
Current files saved by mothur:
fasta=minas_duasamostragens.trim.contigs.good.unique.good.align
group=minas_duasamostragens.contigs.good.groups
name=minas_duasamostragens.trim.contigs.good.names
count=minas_duasamostragens.trim.contigs.good.good.count_table
processors=20
summary=minas_duasamostragens.trim.contigs.good.unique.summary
Current working directory: /diag/home/itiago/Minas/
mothur > summary.seqs(fasta=current, count=current)
Using minas_duasamostragens.trim.contigs.good.good.count_table as input file for the count parameter.
Using minas_duasamostragens.trim.contigs.good.unique.good.align as input file for the fasta parameter.
Using 20 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 10245 28464 404 0 3 1
2.5%-tile: 10366 28464 413 0 4 3719
25%-tile: 10366 28464 417 0 4 37189
Median: 10366 28464 417 0 5 74378
75%-tile: 10366 28464 419 0 6 111566
97.5%-tile: 10366 28464 420 1 7 145036
Maximum: 10367 28478 420 2 9 148754
Mean: 10366 28464 417.39 0.102048 5.0068
of unique seqs: 132441
total # of seqs: 148754
Output File Names:
minas_duasamostragens.trim.contigs.good.unique.good.summary
It took 77 secs to summarize 148754 sequences.
#Again the number of starting point and now the number of sequences is even lower…