I’m analyzing a dataset with 12 groups and on the order of half a million sequences. When I get to the pre.cluster step I have to ramp down the number of processors significantly to get pre.cluster to complete. For example, on a 48-core system with 1.5Tb RAM, if I try to use all 48 processors I get:
*** glibc detected *** mothur: double free or corruption (fasttop): 0x00002ae840003280 ***
======= Backtrace: =========
/opt/glibc-2.14/build/lib/libc.so.6(+0x7374e)[0x2ae81440c74e]
/opt/glibc-2.14/build/lib/libc.so.6(cfree+0x6c)[0x2ae81441074c]
/act/gcc-4.9.2/lib64/libstdc++.so.6(_ZNSs6assignERKSs+0x87)[0x2ae813a94107]
mothur[0x41b1e3]
mothur[0x85064c]
mothur[0xcbcaa9]
mothur[0x13d991f]
/opt/glibc-2.14/build/lib/libpthread.so.0(+0x6e2b)[0x2ae814182e2b]
/opt/glibc-2.14/build/lib/libc.so.6(clone+0x6d)[0x2ae81447131d]
======= Memory map: ========
00400000-016c0000 r-xp 00000000 00:13 32755946 /gpfs0/export/opt/mothur/1.42.3/mothur
018bf000-018c1000 r–p 012bf000 00:13 32755946 /gpfs0/export/opt/mothur/1.42.3/mothur
018c1000-018c7000 rw-p 012c1000 00:13 32755946 /gpfs0/export/opt/mothur/1.42.3/mothur
With fewer and fewer processors I get part of the analysis to complete, e.g. with 20:
Using 20 processors.
Reducing processors to 12.
/******************************************/
Running command: split.groups(groups=lbc10, fasta=combined.good.unique.good.good.filter.unique.fasta, count=combined.good.unique.good.good.filter.count_table)
/******************************************/
Running command: split.groups(groups=lbc9, fasta=combined.good.unique.good.good.filter.unique.fasta, count=combined.good.unique.good.good.filter.count_table)
/******************************************/
Running command: split.groups(groups=libc11, fasta=combined.good.unique.good.good.filter.unique.fasta, count=combined.good.unique.good.good.filter.count_table)
/******************************************/
Running command: split.groups(groups=libc12, fasta=combined.good.unique.good.good.filter.unique.fasta, count=combined.good.unique.good.good.filter.count_table)
/var/spool/slurmd/job1933416/slurm_script: line 101: 23939 Aborted (core dumped) mothur mothurpacbio-multisample.sop
The analysis only completes if I ramp it down to 2 processors. At no point in time does the RAM consumption ever appear to be anywhere approaching the available ram. Any ideas why this would happen?