Hi,
I’m investigating bacteria associated with fish skin. I amplified the V4 of the 16S, using the primers 515F - 806R, as in the Kozich paper.
Two libraries with 120 and 200 fish samples have been sequenced using Illumina MiSeq with the MiSeq Reagent Kit v2 500 cycles.
I got fastq files, one reverse and one forward for each sample. I made the contigs for each library, cut the sequences with screen.seqs to start=230 and end=275 and maxambig=0, split the libraries according to two different experiments and merged the respective files afterwards.
The summary. seqs generated the following:
mothur v.1.33.3
mothur > summary.seqs(fasta=czallstability.trim.contigs.good.pick.fasta, processors=8)
Using 8 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 230 230 0 3 1
2.5%-tile: 1 247 247 0 3 318972
25%-tile: 1 248 248 0 4 3189718
Median: 1 252 252 0 4 6379436
75%-tile: 1 252 252 0 5 9569153
97.5%-tile: 1 253 253 0 6 12439899
Maximum: 1 275 275 0 60 12758870
Mean: 1 250.067 250.067 0 4.39062
of Seqs: 12758870
I then followed the SOP with:
unique.seqs(fasta=czallstability.trim.contigs.good.pick.fasta)
and
align.seqs(fasta=czallstability.trim.contigs.good.pick.unique.fasta, reference=silva.bacteria.fasta, processors=8)
and got the following summary:
mothur > summary.seqs(fasta=czallstability.trim.contigs.good.pick.unique.align, count=czallstability.trim.contigs.good.pick.count_table, processors=8)
Using 8 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 0 0 0 0 1 1
2.5%-tile: 13862 22588 247 0 3 318972
25%-tile: 13862 23439 248 0 4 3189718
Median: 13862 23443 252 0 4 6379436
75%-tile: 13871 23443 252 0 5 9569153
97.5%-tile: 13875 23443 253 0 6 12439899
Maximum: 43116 43116 275 0 60 12758870
Mean: 13867.9 23418.1 250.044 0 4.39032
of unique seqs: 760815
total # of seqs: 12758870
Subsequently, I run the screen command with the following parameters:
mothur > screen.seqs(fasta=czallstability.trim.contigs.good.pick.unique.align, count=czallstability.trim.contigs.good.pick.count_table, start=13875, end=23443, maxhomop=8, processors=8)
which removed about the half of my sequences:
mothur > summary.seqs(fasta=czallstability.trim.contigs.good.pick.unique.good.align, count=czallstability.trim.contigs.good.pick.good.count_table, processors=8)
Using 8 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 11892 23443 234 0 3 1
2.5%-tile: 13862 23443 252 0 3 163572
25%-tile: 13862 23443 252 0 4 1635717
Median: 13862 23443 252 0 4 3271434
75%-tile: 13862 23443 252 0 5 4907151
97.5%-tile: 13862 23443 253 0 6 6379296
Maximum: 13875 25318 275 0 8 6542867
Mean: 13862 23443.1 252.068 0 4.35542
of unique seqs: 341350
total # of seqs: 6542867
When I run screen. seqs with start=13875 and optimize=end I get
mothur > summary.seqs(fasta=czallstability.trim.contigs.good.pick.unique.good.align, count=czallstability.trim.contigs.good.pick.good.count_table)
Using 8 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 11892 23439 234 0 3 1
2.5%-tile: 13862 23439 248 0 3 310057
25%-tile: 13862 23439 248 0 4 3100569
Median: 13862 23443 252 0 4 6201137
75%-tile: 13871 23443 252 0 5 9301705
97.5%-tile: 13871 23443 253 0 6 12092216
Maximum: 13875 25318 275 0 8 12402272
Mean: 13866.3 23441.2 250.156 0 4.39429
of unique seqs: 720310
total # of seqs: 12402272
I’m know puzzled why after aligning some of my sequences don’t overlap at all but since more than 75% of them do I still don’t get why I lose half of my sequences when setting the start and end as I did.
Further I’m not sure if I can use the files produces by screen.seqs with optimize=end.
I appreciate if somebody could help me.
Thanks a lot!