Good morning,
I’m helping a collaborator with their analysis and I’ve hit something I’ve never seen before. They did Miseq V1-3 and are using only the R1 read in analysis because the overlap of R1 and R2 was poor (I made them promise to never do V1-3 again and sent them the “huge distance matrix” blog post ).
Here’s what I’ve done:
fastq.info(fastq=SJR1_S251_L001_R1_001.fastq) (did this for each R1 file)
trim.seqs(fasta=SJR1_S251_L001_R1_001.fasta, oligos=SJR1_oligos.txt, qfile=SJR1_S251_L001_R1_001.qual, pdiffs=2, maxambig=0, maxhomop=8, flip=F, qwindowaverage=30, qwindowsize=25, processors=2) (did this for each file)
make.group
merge.files
screen.seqs(fasta=tick.fasta, group=tick.groups, minlength=150, maxlength=350)
unique.seqs(fasta=tick.good.fasta)
count.seqs(name=tick.good.names, group=tick.good.groups)
align.seqs(fasta=tick.good.unique.fasta, reference=silva.v1v3.fasta)
summary.seqs(fasta=tick.good.unique.align, count=tick.good.count_table)
Start End NBases Ambigs Polymer NumSeqs
Minimum: -1 -1 0 0 1 1
2.5%-tile: 2 4 1 0 1 36471
25%-tile: 2 4295 30 0 3 364706
Median: 2 5060 239 0 4 729411
75%-tile: 2 5188 274 0 5 1094116
97.5%-tile: 10853 12081 280 0 5 1422350
Maximum: 12081 12081 283 0 8 1458820
Mean: 1578.71 5367.35 185.771 0 3.675
of unique seqs: 592288
total # of seqs: 1458820
screen.seqs(fasta=tick.good.unique.align, count=tick.good.count_table, summary=tick.good.unique.summary, end=12081)
filter.seqs(fasta=tick.good.unique.good.align, vertical=T, trump=.)
Length of filtered alignment: 1
Number of columns removed: 12082
Length of the original alignment: 12083
Number of sequences used to construct filter: 146562
I read similar posts in the forum and filter.seqs removes all data and tried:
screen.seqs(fasta=tick.good.unique.align, count=tick.good.count_table, optimize=start-end)
Optimizing start to 10841.
Optimizing end to 57.
summary.seqs(fasta=tick.good.unique.good.align, count=tick.good.good.count_table)
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 57 6 0 1 1
2.5%-tile: 2 3663 10 0 2 29458
25%-tile: 2 4376 209 0 4 294574
Median: 2 5060 255 0 4 589148
75%-tile: 2 5188 280 0 5 883722
97.5%-tile: 9324 12081 280 0 5 1148838
Maximum: 10841 12081 283 0 8 1178295
Mean: 670.108 5243.43 228.897 0 4.19013
of unique seqs: 428552
total # of seqs: 1178295
filter.seqs(fasta=tick.good.unique.good.align, vertical=T, trump=.)
Length of filtered alignment: 0
Number of columns removed: 12081
Length of the original alignment: 12081
Number of sequences used to construct filter: 428552
I’m not sure what’s going on. The screen.seqs start and end optimization looks weird. Anyone seen this before?
Thank you,
Bonnie