dis.seqs issue?

Hi,
I am trying to calculate distances and cluster about 2million sequences. Mothur keeps crashing my server when I run on multiple processors. I’ve tried different numbers of processors, with no success. The server I’m using has 120 gigabytes of RAM and the command is clearly using more than this and then crashes it. I can’t imagine why it needs that much RAM to be running. I’ve started it running on a larger server at only 18 processors -but I imagine this will take many many more days to complete (if at all).

I’m loosely following the MiSeq SOP.

List of commands:

align.seqs(fasta=current, reference=/…/silva.eukarya/silva.eukarya.fasta)

filter.seqs(fasta=current)

unique.seqs(fasta=current)

summary.seqs(fasta=current, count=current)
pre.cluster(fasta=current, group=current, diffs=2)

get.current()
summary.seqs(fasta=current, count=current)

chimera.uchime(fasta=current, dereplicate=t)
get.current()

remove.seqs(fasta=current, count=current, accnos=current)
unique.seqs(fasta=current)
list.seqs(fasta=current)
get.seqs(accnos=current, group=current)
make.table(name=current)

summary.seqs(fasta=current, count=current)

classify.seqs(fasta=current, count=current, reference=/…silva.eukarya/silva.eukarya.silva.tax, cutoff=80)
dist.seqs(fasta=current, cutoff=0.10)
cluster(column=current, count=current)
make.shared(list=current, group=current)
summary.seqs(fasta=current, count=current)
get.current()
get.oturep(column=current, name=current, fasta=current, list=current, group=current)
classify.otu(list=current, count=current, taxonomy=current, persample=true)
get.current()
summary.seqs(fasta=current, name=current)

I’m loosely following the MiSeq SOP.

That’s probably the problem. You have 2 million sequences - how many uniques? With MiSeq data, we have found that unless the reads fully overlap each other (e.g. V4 on the 250 PE kit) you will not get good denoising of your data and will result with an astronomical number of uniques, a gigantic distance matrix, and crashing computers. If this is the case, then you’re probably stuck with just using the phylotype command.

Pat