dis.seqs issue?

shu251 · May 5, 2014, 10:56pm

Hi,
I am trying to calculate distances and cluster about 2million sequences. Mothur keeps crashing my server when I run on multiple processors. I’ve tried different numbers of processors, with no success. The server I’m using has 120 gigabytes of RAM and the command is clearly using more than this and then crashes it. I can’t imagine why it needs that much RAM to be running. I’ve started it running on a larger server at only 18 processors -but I imagine this will take many many more days to complete (if at all).

I’m loosely following the MiSeq SOP.

List of commands:

align.seqs(fasta=current, reference=/…/silva.eukarya/silva.eukarya.fasta)

filter.seqs(fasta=current)

unique.seqs(fasta=current)

summary.seqs(fasta=current, count=current)
pre.cluster(fasta=current, group=current, diffs=2)

get.current()
summary.seqs(fasta=current, count=current)

chimera.uchime(fasta=current, dereplicate=t)
get.current()

remove.seqs(fasta=current, count=current, accnos=current)
unique.seqs(fasta=current)
list.seqs(fasta=current)
get.seqs(accnos=current, group=current)
make.table(name=current)

summary.seqs(fasta=current, count=current)

classify.seqs(fasta=current, count=current, reference=/…silva.eukarya/silva.eukarya.silva.tax, cutoff=80)
dist.seqs(fasta=current, cutoff=0.10)
cluster(column=current, count=current)
make.shared(list=current, group=current)
summary.seqs(fasta=current, count=current)
get.current()
get.oturep(column=current, name=current, fasta=current, list=current, group=current)
classify.otu(list=current, count=current, taxonomy=current, persample=true)
get.current()
summary.seqs(fasta=current, name=current)

pschloss · May 7, 2014, 11:36am

I’m loosely following the MiSeq SOP.

That’s probably the problem. You have 2 million sequences - how many uniques? With MiSeq data, we have found that unless the reads fully overlap each other (e.g. V4 on the 250 PE kit) you will not get good denoising of your data and will result with an astronomical number of uniques, a gigantic distance matrix, and crashing computers. If this is the case, then you’re probably stuck with just using the phylotype command.

Pat

Topic		Replies	Views
Stuck at clustering, its running for more than a week Commands in mothur	7	333	January 20, 2024
Killed: 9 while running dist.seqs mothur bugs	4	1810	February 19, 2019
mothur does not complete batch script Commands in mothur	5	2037	June 8, 2016
Produce too large amount of data when running dist.seqs Commands in mothur	8	7693	October 18, 2013
Cluster command issue Commands in mothur	6	483	December 10, 2021

dis.seqs issue?

Related topics