Someone has handed me data from a v3 600 cycle kit. When I use my normal batch (make.contigs all default, screen.seqs maxambig=0, maxlength=370) all sequences get tossed. After make.contigs all the seqs are ~350bp. This group only has v4 Capporaso primers in their lab, so it has to be v4 sequences. Do you guys have any suggestions for getting any workable data out of v3 chemistry?
oops typo maxlength=270
I adjusted my maxlength to 370 (knowing that I’m increasing garbage)
Summary after make.contigs, screen.seqs, unique.seqs
mothur > summary.seqs(fasta=current, name=current)
Using eoea.trim.contigs.good.unique.fasta as input file for the fasta parameter.
Using eoea.trim.contigs.good.names as input file for the name parameter.
Using 16 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 301 301 0 3 1
2.5%-tile: 1 348 348 0 3 107339
25%-tile: 1 349 349 0 4 1073384
Median: 1 349 349 0 4 2146768
75%-tile: 1 350 350 0 5 3220152
97.5%-tile: 1 350 350 0 6 4186197
Maximum: 1 370 370 0 49 4293535
Mean: 1 349.414 349.414 0 4.35216
# of unique seqs: 4272439
total # of seqs: 4293535
After align.seqs to a trimmed v4 silva alignment
mothur > summary.seqs(fasta=current, count=current)
Using eoea.trim.contigs.good.count_table as input file for the count parameter.
Using eoea.trim.contigs.good.unique.align as input file for the fasta parameter.
Using 16 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: -1 -1 0 0 1 1
2.5%-tile: 1 13425 292 0 3 107339
25%-tile: 1 13425 292 0 4 1073384
Median: 1 13425 293 0 4 2146768
75%-tile: 1 13425 293 0 4 3220152
97.5%-tile: 1 13425 294 0 6 4186197
Maximum: 13425 13425 323 0 19 4293535
Mean: 110.955 13360.4 288.765 0 4.07184
# of unique seqs: 4272439
total # of seqs: 4293535
After screen.seqs, filter.seqs, and pre.cluster (diffs=2)
mothur > summary.seqs(fasta=current, count=current)
Using eoea.trim.contigs.good.unique.good.filter.precluster.count_table as input file for the count parameter.
Using eoea.trim.contigs.good.unique.good.filter.precluster.fasta as input file for the fasta parameter.
Using 16 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 725 253 0 3 1
2.5%-tile: 1 817 292 0 3 105872
25%-tile: 1 817 292 0 4 1058711
Median: 1 817 293 0 4 2117422
75%-tile: 1 817 293 0 4 3176133
97.5%-tile: 1 817 294 0 6 4128972
Maximum: 65 817 323 0 8 4234843
Mean: 1.00098 816.999 292.62 0 4.10739
# of unique seqs: 1784585
total # of seqs: 4234843
I think this is still way too many "uniques"for human samples
You’ll want to use trimoverlap=T in make.contigs and then you should be able to proceed as usual. I forget how often I’m repeating myself, but when the sequencer goes beyond the ends of the fragments the error rates go up significantly. This is on top of the usual craptitude of the V3 chemistry.
ah that’s more like it, after aligning/filtering/clustering
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 552 221 0 3 1
2.5%-tile: 22 552 252 0 3 116830
25%-tile: 22 552 252 0 3 1168293
Median: 22 552 253 0 4 2336586
75%-tile: 22 552 253 0 4 3504879
97.5%-tile: 22 552 254 0 6 4556342
Maximum: 22 575 277 0 8 4673171
Mean: 21.9998 552.001 252.608 0 3.93159
# of unique seqs: 95508
total # of seqs: 4673171