Dear mothur community,
I am a PhD student in a lab with no bioinformatics experience at all but ended up with a 16s dataset I’m supposed to analyse. I am working my way through the MiSeq SOP using Amazon Web Services and the 1.3.95 AMI. I did all steps using the r4.xlarge with 30.5 gb of RAM using 4 processors until I ran dist.seqs as described under the phylogenetics header. The resulting .dist file was ~40 gb in size (guess who ended up with the 2014 blog entry as well). So I changed the system to r4.2xlarge with 61 gb RAM, since I thought that should be able to read in the matrix and proceed with creating the .tre.
The issue now is: clearcut seems to stop working. I’m running mothur in screen and normally go back to the terminal to monitor RAM usage by “top”. For a while I can see RAM usage increasing, until it stops at the same point in all runs and then nothing happens anymore (I checked this for several hours).
Does anyone know why or how this happens? Could there be something corrupted in my .dist file that stops the reading process? Or is the .dist file simply too large for the clearcut program? Any ideas would be highly appreciated, since right now I’m stuck and have no idea how to proceed.
Best
Jonas