So I carried out read.dist as I have normally been doing. But this time, the program keeps on terminating before it finishes reading the distance file, which is surprisingly large (4+ GB). Is there a reason for this termination?
Yeah, you probably don’t have enough memory (RAM) in your computer. Perhaps you haven’t used the cutoff option in the dist.seqs or read.dist commands?
Is there another way I can read such a big file? I have already done cutoff=0.03, which gave me a distance file of slightly less than 3GB. Is there perhaps another way to read this file? .03 is about the least I can go.
If you are using the read.dist to cluster, you could use the hcluster command, http://www.mothur.org/wiki/Hcluster. It is memory light and will be able to process your file.
Thank you so much!
Actually, I am not sure what the problem is now. After generating the distance file using distance.seqs with a .03 cutoff, I am unable to cluster with that distance file (which is around 4 GB) using hcluster. Do you have any suggestions on what I might be doing wrong?
Do you have the correct file format? By default, dist.seqs creates a column formatted distance matrix, and a common mistake is to enter this using the phylip parameter? If that’s not your issue, can you send your logfile to mothur.bugs@gmail.com, and I will take a look.