hcluster - how to specify output not to /tmp

Hi everyone,

When we run the cluster() command on our data set, it consumes greater than 32GB RAM, so we are using the hcluster() command to avoid our machine running into swap space. However the temporary files it writes fills up our /tmp filesystem, causing the command to fail.

sort: write failed: /tmp/sortp1wSqa: No space left on device

When running the hcluster() command, is it possible to specify that the temporary files be written to a filesystem or directory other than /tmp?

Other than resizing this file system, I’ve tried setting the $TMPDIR environment variable which seems to have no effect, and I can’t see a runtime option in the mothur manual, nor a compilation option.

Any hints?

cheers,
Dave

The hcluster command has a sorted parameter. You can sort your column formatted distance file and then set the sorted parameter to true, and mothur will skip the sort and just cluster. The command would look like hcluster(column=yourSortedFile, name=yourNameFile, sorted=true).

Dave,
To follow up on what Sarah wrote - I think that where it is hanging for you is on the sort. To do the file sort we use the sort command that is built into osx/linux. A couple of suggestions/questions…

  1. Try running… sort -n -k +3 distFileName -o outfileName - does this give you the same error? ( it should )
  2. Try running… sort -T pathToSomeOtherDirectory -n -k +3 distFileName -o outfileName - does this give you the same error? ( it should work )

If #1 doesn’t fail, we’ll go back to the drawing board.
If #2 fails, you might double check that you are allowed to create large files on the computer you’re using (I assume you can) and that you’ve got plenty of harddrive sapce. Alternatively, try a different path.
If #2 works, then open readcluster.cpp in the mothur source code and replace line 44 with…

string command = "sort -T pathToSomeOtherDirectory -n -k +3 " + distFile + " -o " + outfile;

Then recompile by typing make. Try the command again and let us know what happens.

Pat

Hi Sarah and Pat,

Thanks very much for the prompt replies. I tried test 1 and it failed as expected. Test 2 was successful. Indeed within the man page for sort it states that you can specify the output directory using the value of the environment variable $TMPDIR, so I tried this again and embarrassingly, after finding an error in my batch script, submitted it again and it now works well! So no need for me to modify the source code.

Cheers!
Dave