I’m running the Miseq SOP on mothur version 1.39.5. I’m running multiple processors as there are >1000 read files in my analysis.I’ve run the following commands (up to the pre.cluster command in the SOP) without hassle:
make.file(inputdir=~/endv_mproc, type=fastq, prefix=stability)
make.contigs(file=stability.files, processors=64)
summary.seqs(fasta=stability.trim.contigs.fasta)
screen.seqs(fasta=stability.trim.contigs.fasta, group=stability.contigs.groups, minlength=371, maxambig=0, maxlength=420)
unique.seqs(fasta=stability.trim.contigs.good.fasta)
count.seqs(name=stability.trim.contigs.good.names, group=stability.contigs.good.groups)
align.seqs(fasta=stability.trim.contigs.good.unique.fasta, reference=silva.v4v5.fasta)
summary.seqs(fasta=stability.trim.contigs.good.unique.align, count=stability.trim.contigs.good.count_table)
screen.seqs(fasta=stability.trim.contigs.good.unique.align, count=stability.trim.contigs.good.count_table, summary=stability.trim.contigs.good.unique.summary, start=3986, end=17783, maxhomop=8)
filter.seqs(fasta=stability.trim.contigs.good.unique.good.align, vertical=T, trump=., processors=64)
unique.seqs(fasta=stability.trim.contigs.good.unique.good.filter.fasta, count=stability.trim.contigs.good.good.count_table)
pre.cluster(fasta=stability.trim.contigs.good.unique.good.filter.unique.fasta, count=stability.trim.contigs.good.unique.good.filter.count_table, diffs=2)
Once it reaches pre.cluster everything slows right down (it took around 50 hours to complete). I’ve checked my CPU usage and it seems to be using mulitple processors before the pre.cluster command then dropping back to a single processor for pre.cluster. My log file shows:
mothur > pre.cluster(fasta=stability.trim.contigs.good.unique.good.filter.unique.fasta, count=stability.trim.contigs.good.unique.good.filter.count_table, diffs=2)
Using 64 processors.
Processing group endv:
2271040 595941 1675099
Total number of sequences before pre.cluster was 2271040.
pre.cluster removed 1675099 sequences.
It took 189282 secs to cluster 2271040 sequences.
It took 189480 secs to run pre.cluster.
I’m not sure if there may be a problem with my hardware not using processors correctly for this command, or whether I’ve input something incorrectly, or if it’s normal for the command to take this long with loads of sequence files?
Many thanks in advance for any advice!