Cluster.split issues: out of memory

AliceBaek · July 14, 2022, 6:48am

Hi,

I am experiencing some difficulties getting past cluster.split in the MiSeq SOP. I sequenced the V4 region of the 16S rRNA gene on the MiSeq using the 2x250 kit. I followed the MiSeq SOP (mothur v.1.47.0) to make contigs and clean the dataset without any issues.
I have tried a number of commands in an attempt to produce OTUs:
make.file(inputdir=., type=gz, prefix=stability)
make.contigs(file=stability.files, maxambig=0, maxlength=292, maxhomop=8, processors=24)
summary.seqs(fasta=stability.trim.contigs.fasta, count=stability.contigs.count_table)
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 37 37 0 3 1
2.5%-tile: 1 291 291 0 3 190879
25%-tile: 1 291 291 0 4 1908783
Median: 1 292 292 0 4 3817565
75%-tile: 1 292 292 0 5 5726347
97.5%-tile: 1 292 292 0 5 7444251
Maximum: 1 300 300 0 8 7635129
Mean: 1 291 291 0 4

of unique seqs: 7635129

total # of seqs: 7635129

It took 83 secs to summarize 7635129 sequences.

mothur > unique.seqs(fasta=stability.trim.contigs.fasta, count=stability.contigs.count_table)
mothur > summary.seqs(fasta=stability.trim.contigs.unique.fasta, count=stability.trim.contigs.count_table)

Using 24 processors.

	Start	End	NBases	Ambigs	Polymer	NumSeqs

Minimum: 1 37 37 0 3 1
2.5%-tile: 1 291 291 0 3 190879
25%-tile: 1 291 291 0 4 1908783
Median: 1 292 292 0 4 3817565
75%-tile: 1 292 292 0 5 5726347
97.5%-tile: 1 292 292 0 5 7444251
Maximum: 1 300 300 0 8 7635129
Mean: 1 291 291 0 4

of unique seqs: 2524391

total # of seqs: 7635129

mothur > align.seqs(fasta=stability.trim.contigs.unique.fasta, reference=/data/ref_database/mothur_silva/silva.nr_v138_1.align)
summary.seqs(fasta=stability.trim.contigs.unique.align, count=stability.trim.contigs.count_table)

Using 24 processors.

	Start	End	NBases	Ambigs	Polymer	NumSeqs

Minimum: 1044 1048 1 0 1 1
2.5%-tile: 11895 25318 291 0 3 190879
25%-tile: 11895 25318 291 0 4 1908783
Median: 11895 25318 292 0 4 3817565
75%-tile: 11895 25318 292 0 5 5726347
97.5%-tile: 11895 25318 292 0 5 7444251
Maximum: 43116 43116 300 0 8 7635129
Mean: 11896 25318 291 0 4

of unique seqs: 2524391

total # of seqs: 7635129

mothur > screen.seqs(fasta=stability.trim.contigs.unique.align, count=stability.trim.contigs.count_table, start=11895, end=25318)
summary.seqs(fasta=stability.trim.contigs.unique.good.align, count=stability.trim.contigs.good.count_table)

Using 24 processors.

	Start	End	NBases	Ambigs	Polymer	NumSeqs

Minimum: 10366 25318 260 0 3 1
2.5%-tile: 11895 25318 291 0 3 188773
25%-tile: 11895 25318 291 0 4 1887729
Median: 11895 25318 292 0 4 3775457
75%-tile: 11895 25318 292 0 5 5663185
97.5%-tile: 11895 25318 292 0 5 7362141
Maximum: 11895 25498 300 0 8 7550913
Mean: 11894 25318 291 0 4

of unique seqs: 2466444

total # of seqs: 7550913

filter.seqs(fasta=stability.trim.contigs.unique.good.align, vertical=T, trump=.)
unique.seqs(fasta=stability.trim.contigs.unique.good.filter.fasta, count=stability.trim.contigs.good.count_table)
summary.seqs(fasta=stability.trim.contigs.unique.good.filter.unique.fasta, count=stability.trim.contigs.unique.good.filter.count_table)

Using 24 processors.

	Start	End	NBases	Ambigs	Polymer	NumSeqs

Minimum: 1 611 260 0 3 1
2.5%-tile: 1 629 291 0 3 188773
25%-tile: 1 629 291 0 4 1887729
Median: 1 629 292 0 4 3775457
75%-tile: 1 629 292 0 5 5663185
97.5%-tile: 1 629 292 0 5 7362141
Maximum: 1 629 300 0 8 7550913
Mean: 1 628 291 0 4

of unique seqs: 2447732

total # of seqs: 7550913

pre.cluster(fasta=stability.trim.contigs.unique.good.filter.unique.fasta, count=stability.trim.contigs.unique.good.filter.count_table, diffs=3)
mothur > chimera.vsearch(fasta=stability.trim.contigs.unique.good.filter.unique.precluster.fasta, count=stability.trim.contigs.unique.good.filter.unique.precluster.count_table, dereplicate=t)
mothur > summary.seqs(fasta=stability.trim.contigs.unique.good.filter.unique.precluster.denovo.vsearch.fasta, count=stability.trim.contigs.unique.good.filter.unique.precluster.denovo.vsearch.count_table)

Using 24 processors.

	Start	End	NBases	Ambigs	Polymer	NumSeqs

Minimum: 1 611 260 0 3 1
2.5%-tile: 1 629 291 0 3 173851
25%-tile: 1 629 291 0 4 1738510
Median: 1 629 292 0 4 3477020
75%-tile: 1 629 292 0 5 5215530
97.5%-tile: 1 629 292 0 5 6780189
Maximum: 1 629 300 0 8 6954039
Mean: 1 628 291 0 4

of unique seqs: 814566

total # of seqs: 6954039

classify.seqs(fasta=stability.trim.contigs.unique.good.filter.unique.precluster.denovo.vsearch.fasta, count=stability.trim.contigs.unique.good.filter.unique.precluster.denovo.vsearch.count_table, reference=/data/ref_database/mothur_silva/silva.nr_v138_1.align, taxonomy=/data/ref_database/mothur_silva/silva.nr_v138_1.tax)
mothur > remove.lineage(fasta=stability.trim.contigs.unique.good.filter.unique.precluster.denovo.vsearch.fasta, count=stability.trim.contigs.unique.good.filter.unique.precluster.denovo.vsearch.count_table, taxonomy=stability.trim.contigs.unique.good.filter.unique.precluster.denovo.vsearch.nr_v138_1.wang.taxonomy, taxon=Chloroplast-Mitochondria-unknown-Archaea-Eukaryota)
mothur > rename.file(fasta=current, count=current, taxonomy=current, prefix=final)
Using stability.trim.contigs.unique.good.filter.unique.precluster.denovo.vsearch.pick.count_table as input file for the count parameter.
Using stability.trim.contigs.unique.good.filter.unique.precluster.denovo.vsearch.pick.fasta as input file for the fasta parameter.
Using stability.trim.contigs.unique.good.filter.unique.precluster.denovo.vsearch.nr_v138_1.wang.pick.taxonomy as input file for the taxonomy parameter.

Next, I tried cluster.split several times in this way
(1) mothur > cluster.split(fasta=final.fasta, count=final.count_table, taxonomy=final.taxonomy, taxlevel=4, processors=40)
(2) mothur > cluster.split(fasta=final.fasta, count=final.count_table, taxonomy=final.taxonomy, taxlevel=4, processors=36)
(3) mothur > cluster.split(fasta=final.fasta, count=final.count_table, taxonomy=final.taxonomy, taxlevel=5, processors=16)
(4) mothur > cluster.split(fasta=final.fasta, count=final.count_table, taxonomy=final.taxonomy, taxlevel=6, processors=16)

But, the result is: Out of memory: Killed process 2276 (mothur) total-vm: 263820292kB, anin-rss: 127037696kB, file-rss: 348kB, shmem-rss:0kB, UID: 1003 pgtables: 501792kB oom_score_adj:0
Killed

How can I solve this problem to get OTUs?

(Total RAM: 128GB, processors:48)

Thank you

pschloss · August 16, 2022, 5:43pm

Hi - it looks like your V4 sequences still have the barcodes and primers attached to them since the length of each contig should be about 252 nt long. I suspect that when you remove the barcodes the number of unique sequences will drop significantly reducing the amount of RAM required.

Pat

Topic		Replies	Views
Problems with Cluster.split	6	460	August 2, 2022
cluster and cluster.split error mothur bugs	3	5967	July 24, 2014
cluster.split Commands in mothur	10	10448	March 12, 2015
cluster.split problem Theory behind mothur	1	3404	January 9, 2015
MiSeq SOP cluster step out of memory Commands in mothur	2	4405	December 4, 2013

Cluster.split issues: out of memory

of unique seqs: 7635129

of unique seqs: 2524391

of unique seqs: 2524391

of unique seqs: 2466444

of unique seqs: 2447732

of unique seqs: 814566

Related topics