Problem with OTU classification

stoneraining · April 14, 2010, 9:10pm

I am running mothur on a server (xeon 2.8G, 32Gb RAM) to classify OTU for a 454 sequencing data. The data-set contains around 30,000 sequences (~4Gb distance-matrix). It has run for 30h, and still not finished. The ram was almost fully used by mothur, but very little cpu was occupied (<2%). Is it normal?
I also try the “hcluster” on another cluster for the same data. I just receive an email from the IT. He told me, my program cause a terrible problem on the node, act like it was broken. And the program was deleted. Actually, i have successfully analyzed a smaller data (around 21,000 sequences) using mothur. Anybody succeeded with such size data?
(mothur was compiled according to the 64bit instruction).
Thank you!
Stone

Rewski52 · April 15, 2010, 4:53pm

Do you have tag sequences? You might try using filter.seqs(fasta=youraligned.sequences, trump=.) and then unique.seqs(fasta=youraligned.filtered.sequences, names=namesfile) after aligning the sequences, if you haven’t already.

Mothur should definitely be able to handle a 30,000 sequence distance matrix.

stoneraining · April 15, 2010, 5:03pm

The barcodes were have been removed from my sample. And the unique sequences were selected by CD-Hit. The number of unique sequences were aligned by greengenes NAST. Then the distance matrix was computated by ARB. The phylip matrix distance was then used to classify OTU using mothur.
Actually, i also tried to align these sequence using mothur. However, it didn’t finished after 2 days.

pschloss · April 16, 2010, 11:32am

I suspect you’ve got a formatting issue. Can you post your logfile? There should be no problems with unique.seqs, align.seqs, dist.seqs, or cluster. I would argue that there are limitations (some of them significant) with each of the other methods you are using as a work around when mothur should be able to do things just fine. Are you following the Costello analysis example listed on the wiki?

When you run read.dist are you reading the arb matrix in as a phylip or column-formatted matrix? When you’re using arb to generate the matrix you want to use the phylip option.

stoneraining · April 16, 2010, 4:34pm

Here is the logFile, it didn’t show anything abnormal. It just stuck at the cluster() step for 2 days. It also use too much of the RAM (more than 30Gb), thus, i killed the program yesterday. Then, I tried Dotur last evening on the same server, it finished the OTU classification step this morning.

mothur > read.dist(phylip=all_16S_final.nast.dist)********************###########
Reading matrix: |||||||||||||||||||||||||||||||||||||||||||||||||||

It took 164 secs to read

mothur > cluster()

pschloss · April 19, 2010, 9:10pm

Could you send the logfile and distance matrix to mothur.bugs@gmail.com?

Thanks,
Pat

Topic		Replies	Views
Each and All of Sequences adressed to indiviuals OTU, how can I solve this? Theory behind mothur	8	2856	June 30, 2016
Problems handling a >50 Gb distance matrix (cluster command) mothur bugs	12	14734	October 18, 2013
RAM issue with clustering OTUs Commands in mothur	4	646	February 6, 2021
Unable to classify OTU with cluster.split mothur bugs	1	114	March 28, 2024
Segmentation fault when clustering a 1.44 GB dist file mothur bugs	5	135486	November 14, 2009

Problem with OTU classification

Related topics