Different results make.contigs 1.47 vs older version

Hello!

I am updating a previous run to Mothur 1.47 so that I can start using it as a basis to build a database with cluster.fit.

There is some differences with version 1.47.

firstly, I am keeping a lot more of sequences with the new version. There is also ambig bases after make.contig that remains. So far these are the main differences. Any idea Why? I am posting both for comparison. The one difference is that I am setting the seed to 100 in v 1.47 before contig, precluster and clustering steps for reproducibility. Is it the fixed seed that is affecting the process?

Many thanks.

Old (1.44 I think)
Setting logfile name to megacampy_logFile_clustersplit

mothur > make.contigs(file=megacampy.files, oligos=primers.oligo.txt, checkorient=t, pdiffs=2, deltaq=5)

…

mothur > screen.seqs(fasta=current, group=current, summary=current, maxambig=0, maxhomop=70)

…

unique.seqs(fasta=current)

……

mothur > summary.seqs(fasta=current, count=current)

Using 32 processors.

Start End NBases Ambigs Polymer NumSeqs

Minimum: 1 49 49 0 3 1

2.5%-tile: 1 252 252 0 3 1547809

25%-tile: 1 253 253 0 4 15478090

Median: 1 253 253 0 4 30956180

75%-tile: 1 253 253 0 5 46434269

97.5%-tile: 1 253 253 0 6 60364550

Maximum: 1 463 463 0 70 61912358

Mean: 1 252 252 0 4

# of unique seqs: 4672367

total # of seqs: 61912358

It took 150 secs to summarize 61912358 sequences.

align.seqs(fasta=current, reference=silva.nr_v132.pcr.align, flip=t)

……..

Start End NBases Ambigs Polymer NumSeqs

Minimum: 0 0 0 0 1 1

2.5%-tile: 1968 11550 252 0 3 1547809

25%-tile: 1968 11550 253 0 4 15478090

Median: 1968 11550 253 0 4 30956180

75%-tile: 1968 11550 253 0 5 46434269

97.5%-tile: 1968 11550 253 0 6 60364550

Maximum: 13425 13425 418 0 50 61912358

Mean: 1968 11549 252 0 4

# of unique seqs: 4672367

total # of seqs: 61912358

1.47
mothur > set.seed(seed=100)

Setting random seed to 100.

mothur > make.contigs(file=megacampy.files, oligos=primers.oligo.txt, checkorient=t, pdiffs=2, deltaq=5, maxambig=0, maxhomop=50)

……..

mothur > unique.seqs(fasta=current)

……

mothur > summary.seqs(fasta=current, count=current)

Using megacampy.trim.contigs.count_table as input file for the count parameter.

Using megacampy.trim.contigs.unique.fasta as input file for the fasta parameter.

Using 32 processors.

Start End NBases Ambigs Polymer NumSeqs

Minimum: 1 23 23 0 3 1

2.5%-tile: 1 252 252 0 3 1968628

25%-tile: 1 253 253 0 4 19686276

Median: 1 253 253 0 4 39372552

75%-tile: 1 253 253 0 5 59058827

97.5%-tile: 1 253 253 12 6 76776475

Maximum: 1 463 463 189 230 78745102

Mean: 1 252 252 1 4

of unique seqs: 18276278

total # of seqs: 78745102

……….

mothur > align.seqs(fasta=current, reference=silva.nr_v132.pcr.align, flip=t)

Start End NBases Ambigs Polymer NumSeqs

Minimum: 0 0 0 0 1 1

2.5%-tile: 1968 11550 252 0 3 1968628

25%-tile: 1968 11550 253 0 4 19686276

Median: 1968 11550 253 0 4 39372552

75%-tile: 1968 11550 253 0 5 59058827

97.5%-tile: 1968 11550 253 12 6 76776475

Maximum: 13425 13425 452 94 95 78745102

Mean: 1968 11549 252 1 4

of unique seqs: 18276278

total # of seqs: 78745102

Hello, just an update.

SO I stopped the job and restarted it with modifications. Turns out that the seed is not the problem. screening against adding maxambig=0 to screen .seq (after alignment) put things back to normal. Looks like make.contigs is not able to remove all ambigs.

Cheers!

Thanks for reporting this bug. The removal of ambiguous bases and homopolymers in make.contigs is not triggered unless you add the maxlength option. This bug will be corrected in our next release.

Glad I was of some use.

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.