Hello!
I am updating a previous run to Mothur 1.47 so that I can start using it as a basis to build a database with cluster.fit.
There is some differences with version 1.47.
firstly, I am keeping a lot more of sequences with the new version. There is also ambig bases after make.contig that remains. So far these are the main differences. Any idea Why? I am posting both for comparison. The one difference is that I am setting the seed to 100 in v 1.47 before contig, precluster and clustering steps for reproducibility. Is it the fixed seed that is affecting the process?
Many thanks.
Old (1.44 I think)
Setting logfile name to megacampy_logFile_clustersplit
mothur > make.contigs(file=megacampy.files, oligos=primers.oligo.txt, checkorient=t, pdiffs=2, deltaq=5)
…
mothur > screen.seqs(fasta=current, group=current, summary=current, maxambig=0, maxhomop=70)
…
unique.seqs(fasta=current)
……
mothur > summary.seqs(fasta=current, count=current)
Using 32 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 49 49 0 3 1
2.5%-tile: 1 252 252 0 3 1547809
25%-tile: 1 253 253 0 4 15478090
Median: 1 253 253 0 4 30956180
75%-tile: 1 253 253 0 5 46434269
97.5%-tile: 1 253 253 0 6 60364550
Maximum: 1 463 463 0 70 61912358
Mean: 1 252 252 0 4
# of unique seqs: 4672367
total # of seqs: 61912358
It took 150 secs to summarize 61912358 sequences.
align.seqs(fasta=current, reference=silva.nr_v132.pcr.align, flip=t)
……..
Start End NBases Ambigs Polymer NumSeqs
Minimum: 0 0 0 0 1 1
2.5%-tile: 1968 11550 252 0 3 1547809
25%-tile: 1968 11550 253 0 4 15478090
Median: 1968 11550 253 0 4 30956180
75%-tile: 1968 11550 253 0 5 46434269
97.5%-tile: 1968 11550 253 0 6 60364550
Maximum: 13425 13425 418 0 50 61912358
Mean: 1968 11549 252 0 4
# of unique seqs: 4672367
total # of seqs: 61912358
1.47
mothur > set.seed(seed=100)
Setting random seed to 100.
mothur > make.contigs(file=megacampy.files, oligos=primers.oligo.txt, checkorient=t, pdiffs=2, deltaq=5, maxambig=0, maxhomop=50)
……..
mothur > unique.seqs(fasta=current)
……
mothur > summary.seqs(fasta=current, count=current)
Using megacampy.trim.contigs.count_table as input file for the count parameter.
Using megacampy.trim.contigs.unique.fasta as input file for the fasta parameter.
Using 32 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 23 23 0 3 1
2.5%-tile: 1 252 252 0 3 1968628
25%-tile: 1 253 253 0 4 19686276
Median: 1 253 253 0 4 39372552
75%-tile: 1 253 253 0 5 59058827
97.5%-tile: 1 253 253 12 6 76776475
Maximum: 1 463 463 189 230 78745102
Mean: 1 252 252 1 4
of unique seqs: 18276278
total # of seqs: 78745102
……….
mothur > align.seqs(fasta=current, reference=silva.nr_v132.pcr.align, flip=t)
Start End NBases Ambigs Polymer NumSeqs
Minimum: 0 0 0 0 1 1
2.5%-tile: 1968 11550 252 0 3 1968628
25%-tile: 1968 11550 253 0 4 19686276
Median: 1968 11550 253 0 4 39372552
75%-tile: 1968 11550 253 0 5 59058827
97.5%-tile: 1968 11550 253 12 6 76776475
Maximum: 13425 13425 452 94 95 78745102
Mean: 1968 11549 252 1 4
of unique seqs: 18276278
total # of seqs: 78745102