mothur cluster command

Hi,
I am running a mothur analysis of 600,000 16S-454 sequences. For so many sequences, it is difficult to run the cluster command. It would take a very lo……ong time. I found this command cannot use multiple processors and causes memory exhausted. Is there some solutions for this case? Thanks for any advice on this.

Yao

Are you following http://www.mothur.org/wiki/Schloss_SOP?

Dear Pat,
Thanks for your reply.
I ran hcluster command, but there are some errors in the process (please refer to the following), and the output files only include the results at cutoff of unique.
Could you please give some suggestions to me?
Thanks, Yao


mothur > dist.seqs(fasta=SCS454seq.unique.filter.fasta, cutoff=0.03, processors=16) …

Output File Name:
SCS454seq.unique.filter.dist

It took 215569 to calculate the distances for 495640 sequences.

mothur > hcluster(column=SCS454seq.unique.filter.dist, name=SCS454seq.names, method=furthest)
[ERROR]: Could not open SCS454seq.unique.filter.sorted.dist.temp
It took 34779 seconds to sort.
[ERROR]: SCS454seq.unique.filter.sorted.dist is blank. Please correct.
changed cutoff to 10.005

Output File Names:
SCS454seq.unique.filter.fn.sabund
SCS454seq.unique.filter.fn.rabund
SCS454seq.unique.filter.fn.list

It took 1 seconds to cluster.

I would stay far away from hcluster. It also doesn’t look like you are following the SOP. There’s really little chance that you have 500k unique sequences if you are denoising your sequences, using screen.seqs (you aren’t), and filtering with trump=… I’d suggest following the SOP and see how it goes from there.

Pat