I am running mothur on a server (xeon 2.8G, 32Gb RAM) to classify OTU for a 454 sequencing data. The data-set contains around 30,000 sequences (~4Gb distance-matrix). It has run for 30h, and still not finished. The ram was almost fully used by mothur, but very little cpu was occupied (<2%). Is it normal?
I also try the “hcluster” on another cluster for the same data. I just receive an email from the IT. He told me, my program cause a terrible problem on the node, act like it was broken. And the program was deleted. Actually, i have successfully analyzed a smaller data (around 21,000 sequences) using mothur. Anybody succeeded with such size data?
(mothur was compiled according to the 64bit instruction).
Thank you!
Stone
Do you have tag sequences? You might try using filter.seqs(fasta=youraligned.sequences, trump=.) and then unique.seqs(fasta=youraligned.filtered.sequences, names=namesfile) after aligning the sequences, if you haven’t already.
Mothur should definitely be able to handle a 30,000 sequence distance matrix.
The barcodes were have been removed from my sample. And the unique sequences were selected by CD-Hit. The number of unique sequences were aligned by greengenes NAST. Then the distance matrix was computated by ARB. The phylip matrix distance was then used to classify OTU using mothur.
Actually, i also tried to align these sequence using mothur. However, it didn’t finished after 2 days.
I suspect you’ve got a formatting issue. Can you post your logfile? There should be no problems with unique.seqs, align.seqs, dist.seqs, or cluster. I would argue that there are limitations (some of them significant) with each of the other methods you are using as a work around when mothur should be able to do things just fine. Are you following the Costello analysis example listed on the wiki?
When you run read.dist are you reading the arb matrix in as a phylip or column-formatted matrix? When you’re using arb to generate the matrix you want to use the phylip option.
Here is the logFile, it didn’t show anything abnormal. It just stuck at the cluster() step for 2 days. It also use too much of the RAM (more than 30Gb), thus, i killed the program yesterday. Then, I tried Dotur last evening on the same server, it finished the OTU classification step this morning.
mothur > read.dist(phylip=all_16S_final.nast.dist)********************###########
Reading matrix: |||||||||||||||||||||||||||||||||||||||||||||||||||
It took 164 secs to read
mothur > cluster()
Could you send the logfile and distance matrix to mothur.bugs@gmail.com?
Thanks,
Pat