Hi,
I am experiencing some difficulties getting past cluster.split in the MiSeq SOP. I sequenced the V4 region of the 16S rRNA gene on the MiSeq using the 2x250 kit. I followed the MiSeq SOP (mothur v.1.47.0) to make contigs and clean the dataset without any issues.
I have tried a number of commands in an attempt to produce OTUs:
make.file(inputdir=., type=gz, prefix=stability)
make.contigs(file=stability.files, maxambig=0, maxlength=292, maxhomop=8, processors=24)
summary.seqs(fasta=stability.trim.contigs.fasta, count=stability.contigs.count_table)
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 37 37 0 3 1
2.5%-tile: 1 291 291 0 3 190879
25%-tile: 1 291 291 0 4 1908783
Median: 1 292 292 0 4 3817565
75%-tile: 1 292 292 0 5 5726347
97.5%-tile: 1 292 292 0 5 7444251
Maximum: 1 300 300 0 8 7635129
Mean: 1 291 291 0 4
# of unique seqs: 7635129
total # of seqs: 7635129
It took 83 secs to summarize 7635129 sequences.
mothur > unique.seqs(fasta=stability.trim.contigs.fasta, count=stability.contigs.count_table)
mothur > summary.seqs(fasta=stability.trim.contigs.unique.fasta, count=stability.trim.contigs.count_table)
Using 24 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 37 37 0 3 1
2.5%-tile: 1 291 291 0 3 190879
25%-tile: 1 291 291 0 4 1908783
Median: 1 292 292 0 4 3817565
75%-tile: 1 292 292 0 5 5726347
97.5%-tile: 1 292 292 0 5 7444251
Maximum: 1 300 300 0 8 7635129
Mean: 1 291 291 0 4
# of unique seqs: 2524391
total # of seqs: 7635129
mothur > align.seqs(fasta=stability.trim.contigs.unique.fasta, reference=/data/ref_database/mothur_silva/silva.nr_v138_1.align)
summary.seqs(fasta=stability.trim.contigs.unique.align, count=stability.trim.contigs.count_table)
Using 24 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1044 1048 1 0 1 1
2.5%-tile: 11895 25318 291 0 3 190879
25%-tile: 11895 25318 291 0 4 1908783
Median: 11895 25318 292 0 4 3817565
75%-tile: 11895 25318 292 0 5 5726347
97.5%-tile: 11895 25318 292 0 5 7444251
Maximum: 43116 43116 300 0 8 7635129
Mean: 11896 25318 291 0 4
# of unique seqs: 2524391
total # of seqs: 7635129
mothur > screen.seqs(fasta=stability.trim.contigs.unique.align, count=stability.trim.contigs.count_table, start=11895, end=25318)
summary.seqs(fasta=stability.trim.contigs.unique.good.align, count=stability.trim.contigs.good.count_table)
Using 24 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 10366 25318 260 0 3 1
2.5%-tile: 11895 25318 291 0 3 188773
25%-tile: 11895 25318 291 0 4 1887729
Median: 11895 25318 292 0 4 3775457
75%-tile: 11895 25318 292 0 5 5663185
97.5%-tile: 11895 25318 292 0 5 7362141
Maximum: 11895 25498 300 0 8 7550913
Mean: 11894 25318 291 0 4
# of unique seqs: 2466444
total # of seqs: 7550913
filter.seqs(fasta=stability.trim.contigs.unique.good.align, vertical=T, trump=.)
unique.seqs(fasta=stability.trim.contigs.unique.good.filter.fasta, count=stability.trim.contigs.good.count_table)
summary.seqs(fasta=stability.trim.contigs.unique.good.filter.unique.fasta, count=stability.trim.contigs.unique.good.filter.count_table)
Using 24 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 611 260 0 3 1
2.5%-tile: 1 629 291 0 3 188773
25%-tile: 1 629 291 0 4 1887729
Median: 1 629 292 0 4 3775457
75%-tile: 1 629 292 0 5 5663185
97.5%-tile: 1 629 292 0 5 7362141
Maximum: 1 629 300 0 8 7550913
Mean: 1 628 291 0 4
# of unique seqs: 2447732
total # of seqs: 7550913
pre.cluster(fasta=stability.trim.contigs.unique.good.filter.unique.fasta, count=stability.trim.contigs.unique.good.filter.count_table, diffs=3)
mothur > chimera.vsearch(fasta=stability.trim.contigs.unique.good.filter.unique.precluster.fasta, count=stability.trim.contigs.unique.good.filter.unique.precluster.count_table, dereplicate=t)
mothur > summary.seqs(fasta=stability.trim.contigs.unique.good.filter.unique.precluster.denovo.vsearch.fasta, count=stability.trim.contigs.unique.good.filter.unique.precluster.denovo.vsearch.count_table)
Using 24 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 611 260 0 3 1
2.5%-tile: 1 629 291 0 3 173851
25%-tile: 1 629 291 0 4 1738510
Median: 1 629 292 0 4 3477020
75%-tile: 1 629 292 0 5 5215530
97.5%-tile: 1 629 292 0 5 6780189
Maximum: 1 629 300 0 8 6954039
Mean: 1 628 291 0 4
# of unique seqs: 814566
total # of seqs: 6954039
classify.seqs(fasta=stability.trim.contigs.unique.good.filter.unique.precluster.denovo.vsearch.fasta, count=stability.trim.contigs.unique.good.filter.unique.precluster.denovo.vsearch.count_table, reference=/data/ref_database/mothur_silva/silva.nr_v138_1.align, taxonomy=/data/ref_database/mothur_silva/silva.nr_v138_1.tax)
mothur > remove.lineage(fasta=stability.trim.contigs.unique.good.filter.unique.precluster.denovo.vsearch.fasta, count=stability.trim.contigs.unique.good.filter.unique.precluster.denovo.vsearch.count_table, taxonomy=stability.trim.contigs.unique.good.filter.unique.precluster.denovo.vsearch.nr_v138_1.wang.taxonomy, taxon=Chloroplast-Mitochondria-unknown-Archaea-Eukaryota)
mothur > rename.file(fasta=current, count=current, taxonomy=current, prefix=final)
Using stability.trim.contigs.unique.good.filter.unique.precluster.denovo.vsearch.pick.count_table as input file for the count parameter.
Using stability.trim.contigs.unique.good.filter.unique.precluster.denovo.vsearch.pick.fasta as input file for the fasta parameter.
Using stability.trim.contigs.unique.good.filter.unique.precluster.denovo.vsearch.nr_v138_1.wang.pick.taxonomy as input file for the taxonomy parameter.
Next, I tried cluster.split several times in this way
(1) mothur > cluster.split(fasta=final.fasta, count=final.count_table, taxonomy=final.taxonomy, taxlevel=4, processors=40)
(2) mothur > cluster.split(fasta=final.fasta, count=final.count_table, taxonomy=final.taxonomy, taxlevel=4, processors=36)
(3) mothur > cluster.split(fasta=final.fasta, count=final.count_table, taxonomy=final.taxonomy, taxlevel=5, processors=16)
(4) mothur > cluster.split(fasta=final.fasta, count=final.count_table, taxonomy=final.taxonomy, taxlevel=6, processors=16)
(5)mothur > cluster.split(fasta=final.fasta, count=final.count_table, taxonomy=final.taxonomy, taxlevel=6, processors=16, cutoff=0.10)
(6)mothur > cluster.split(fasta=final.fasta, count=final.count_table, taxonomy=final.taxonomy, taxlevel=6, processors=8, cutoff=0.10)
But, the result is: Out of memory: Killed process 2276 (mothur) total-vm: 263820292kB, anin-rss: 127037696kB, file-rss: 348kB, shmem-rss:0kB, UID: 1003 pgtables: 501792kB oom_score_adj:0
Killed
How can I figure it out to get OTUs?
(Total RAM: 128GB, processors:48)
Thank you.