Stuck at cluster.split -- how do I overcome RAM issue?

Hello,

I am experiencing some difficulties getting past cluster.split in the MiSeq SOP. I sequenced the V4 region of the 16S rRNA gene on the MiSeq (llumina RTA v1.17.28; MCS v2.2) using the 2x250 kit. I did not run a mock community. I followed the MiSeq SOP (mothur v.1.31.2.) to make contigs and clean the dataset without any issues. After screen.seqs I had ~468,000 unique sequences.

I have tried a number of commands in an attempt to produce OTUs:

  1. I used dist.seqs with a cutoff of 0.20. This produced a .dist file, but failed when it tried to cluster due to a lack of RAM (my .dist file was 650 GB, and I was using a computer with 16 GB of RAM).
  2. I ran cluster.split using a cutoff of 0.10 and provided my fasta file, but again, this failed due to a lack of RAM. I specified taxlevel=4.
  3. I repeated 2), but included large=T so it wouldn’t load the .dist file into RAM. This failed. I suspect it is because I provided a fasta file, instead of a distance file.
  4. First I ran cluster.split with cluster=F, which produced 53 .dist files. From this, I created a batch file so it would run cluster.split on each distance file, with large=T. My largest .dist file was over 500GB. This step also failed, and produced 2 errors. I have provided the log file below of this attempt:

cd /volumes/”untitled 1 1”/mothur

./mothur DistanceFileBatch.txt

mothur > set.dir (input=/Volumes/Untitled 1 1/mothur, output=/Volumes/Untitled 1/Amy
Mothur’s directories:
outputDir=/Volumes/Untitled 1/Amy/
inputDir=/Volumes/Untitled 1 1/mothur/

mothur > cluster.split(column=stability.tim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta.0.dist, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table, large=T, processors=8)

Using 8 processors.
Using splitmethod distance.
Splitting the file…
It took 401316 seconds to split the distance file.
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.27.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.42.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.21.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.81.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.112.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.76.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.139.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.158.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.93.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.145.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.156.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.162.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.208.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.236.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.219.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.270.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.226.temp

Reading /Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.226.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.145.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.165.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.162.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.208.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.236.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.219.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.270.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.226.temp

Reading /Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.201.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.145.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.165.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.162.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.208.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.236.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.219.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.270.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.226.temp

Reading /Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.266.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.145.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.165.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.162.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.208.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.236.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.219.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.270.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.226.temp

Reading /Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.268.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.145.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.165.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.162.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.208.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.236.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.219.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.270.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.226.temp

Reading /Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.256.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.145.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.165.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.162.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.208.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.236.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.219.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.270.temp
/Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.226.temp


Reading /Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.299.temp /Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.145.temp /Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.165.temp /Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.162.temp /Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.208.temp /Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.236.temp /Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.219.temp /Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.270.temp /Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.226.temp
Reading /Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.248.temp /Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.145.temp /Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.165.temp /Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.162.temp /Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.208.temp /Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.236.temp /Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.219.temp /Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.270.temp /Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.226.temp
Reading /Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.24.temp [ERROR]: Could not open /Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.248.temp [ERROR]: Could not open /Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.201.temp [ERROR]: Could not open /Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.266.temp [ERROR]: Could not open /Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.229.temp [ERROR]: Could not open /Volumes/Untitled 1 1/mothur/stability.trim.contigs.good.unique.good.filterunique.precluster.pick.pick.fasta.0.dist.24.temp [ERROR]: Your count table contains more than 1 sequence named , sequence names must be unique. Please correct. [ERROR]: Your count table contains more than 1 sequence named , sequence names must be unique. Please correct. [ERROR]: Your count table contains more than 1 sequence named , sequence names must be unique. Please correct. [ERROR]: Your count table contains more than 1 sequence named , sequence names must be unique. Please correct. … (Repeats this error thousands of times)
So, my questions are:
  1. What has produced these two errors?
  2. How should I alter my command lines?
  3. Is there a better way to cluster large distance matrices?
  4. Is there a foreseeable issue with subsampling my data before creating OTUs?

Thanks very much!!

Amy

Hi Amy,

So a few answers and more questions… First, large=T is really a disaster. For average neighbor it just sucks up RAM and take a lot longer than the default. I’m not sure what’s going on with the errors, but I suspect it has something to do with the large=T option being used.

What % PhiX were you using? How confident are you that 1.17.28/2.2 was used?
Did you follow the SOP exactly?
How many unique sequences did you have after pre.cluster?
Could you try taxlevel=5 or 6?


What's in the SOP should work...

cluster.split(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.pick.count_table, taxonomy=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.pick.taxonomy, splitmethod=classify, taxlevel=4, cutoff=0.15)

Again, try taxlevel=5 or 6 and see what happens. We’ve successfully run this for studies with reads from multiple runs pooled together.

Pat

I too am having similar problems but with ~500,000 PacBio 18S sequences. I am using the latest Linux mothur and have been following the 454 SOP. I first tried to run final.dist followed by cluster(column=final.dist, name=final.names) with ~200,000 unique sequences. I was able to create a final.dist file of about 800GB. But not surprisingly cluster killed itself halfway through. I then decided to go down the cluster.split route:

cluster.split(fasta=final.fasta, taxonomy=final.taxonomy, name=final.names, taxlevel=3, processors=8)

After two days this also killed, right at the end of the analysis, this is all probably likely due to RAM issues.

Clustering final.fasta.2.dist
|||||||


Clustering final.fasta.3.dist
Cutoff was 0.255 changed cutoff to 0.12
Cutoff was 0.255 changed cutoff to 0.13
Killed
[jonesbe@aqua BMJfasta]$ Cutoff was 0.255 changed cutoff to 0.11
Cutoff was 0.255 changed cutoff to 0.13

I am VPNing into my research group’s computer which has a lot of memory (>5TB) but I am not sure what the RAM is. It is used for satellite data analysis but I suspect it doesn’t have enough for what I’ve been trying to do.

Would you recommend that I tried a different taxonomic level or perhaps start again with non-pooled fasta files? Or perhaps try the Illumina SOP with count_tables?

Thanks for any advice!

Bethan

Yeah - try taxlevel=5

Thanks! Will let you know how it goes.

Bethan

Hi Pat,

Unfortunately it still killed… Do you think I should split the input file, or perhaps I could try the Illumina method? Does count_table change the amount of RAM needed?

Bethan

(I am going to try taxlevel=6 in the meantime)

Yeah, you could try the count_table approach. Based on my previous work with PacBio, my suspicion is that the data is of a pretty poor quality and that is mucking up the works for you.

Pat

Thanks Pat, I’ll give it a go and write back about what happened.
Bethan

Hi Pat,

Thanks very much for your reply. I went ahead and tried cluster.split again using taxlevel=5, with 1 processor on a computer with 16GB of RAM. It was my hope that allocating all the RAM to one processor would overcome any RAM issues. After running cluster.split for 2 days, the computer produced the following error: “The Mac OS X startup disk has no more space available for application memory”. It did not finish the command.

Here are some of the answers to the questions you asked previously:
Q: What % PhiX were you using? How confident are you that 1.17.28/2.2 was used?
A: For the % PhiX, we aim to get 5 and 10% using the new software, but in order to get that we have actually been spiking around 20%. The sequencing facility informed us that 1.17.28/2.2 was being used.

Q: Did you follow the SOP exactly?
A: Yes, I followed the MiSeq SOP exactly

Q: How many unique sequences did you have after pre.cluster?
A: I did not run summary.seqs after pre.cluster. The next time I ran summary.seqs after this command was in remove.seqs, and at this stage I had 467,407 unique sequences

Q: Could you try taxlevel=5 or 6?
A: Yes. I tried taxlevel=5 with no success.


Here are some more questions for you: 1. How much RAM and how many processors do you have in your computers? 2. For cluster.split do you use all of your processors? 3. I currently have access to a MacPro with 16GB of RAM, but we can update it to 64GB if it would help. Do you think this would help resolve some of our issues? 4. Can I subsample before running cluster.split?
Thanks again for your help,

Amy

  1. How much RAM and how many processors do you have in your computers?

So the computer that we did the paper on had 48 GB of RAM, but I’m pretty sure it didn’t use all of it.

  1. For cluster.split do you use all of your processors?

Not necessarily, if there are some ginomrous distance matrices we might turn that down a bit

  1. I currently have access to a MacPro with 16GB of RAM, but we can update it to 64GB if it would help. Do you think this would help resolve some of our issues?

It might, but I can’t give too definitive of an answer.

  1. Can I subsample before running cluster.split?

Yup.

Something that occurs to me is that the error makes me think that you have swamped out the hard drive. Is this possible? Could you also try running cluster.split as you have it but adding cutoff=0.20?

Pat

Pat

Just an update on the issues Amy was having. Various combinations of tax level, cutoff and processors didn’t work (12 core 3.06 GHz MacPro with 16GB RAM). I sucked it up and upgraded to 96 GB RAM (gotta love 3rd party RAM) and we were able to process it at taxlevel 4, cutoff=0.10 and 12 processors in ~40 hours.

Scott

Ok, thanks for the update - glad it worked out. I’m pretty skeptical of the results as we’ve not had a problem with the new software and 5-10% PhiX.