cluster.split(processors=32) regenerating large *temp?

wolfganglab · July 11, 2016, 2:44pm

Hello all,

I have come to understand the controversy among V3 versus V2 chemistry and its generation of nasty large distance matrices (mine is 1.8T)… I’ve read the issues with this but size and memory are not quite the issue as I’m working on a high performance cluster. I’m using Illumina paired end bacterial variable region 4 data; more or less following the Mothur MiSeq SOP mixed with this example: https://www.abdn.ac.uk/genomics/documents/Mothor_training_guide.pdf

My problem point is running cluster.split (as #cluster is not parallel)

cluster.split(column=ancil.trim.contigs.good.unique.good.filter.precluster.pick.pick.subsample.dist,
name=ancil.trim.contigs.good.unique.good.filter.precluster.pick.pick.subsample.names
large=T, processors=36)

However as I’ve watched the .dist..temp file grow in size, it seemed to delete itself and start rebuilding again (somewhere around 1TB it started making a new disttemp). Is mothur working as it should? Other dist..temp files are being created, but they’re only in the kilobyte range. I’m running the most current version, 1.37.5

I tried my full set of commands in a script that worked with one sample, then I tried two, then about 6, all using #cluster. Now that it’s scaled it up to 96 samples… I thought it useful to go with #cluster.split. Am I missing something- or just being impatient?

Thanks for reading!

pschloss · July 18, 2016, 12:06pm

I’m not sure this is worth trying unless you really haver multiple TB worth of RAM. You will get a separate dist temp file for each taxonomic level so it will keep generating new files as it calculates the distances for each level. Your local sysadmin might also have settings keeping you from generating such large files.

Pat

Topic		Replies	Views
cluster.split making TB of temp files Commands in mothur	3	1419	September 30, 2016
problem with cluster.split...? mothur bugs	2	3073	December 29, 2014
cluster.split issue mothur bugs	1	1768	March 24, 2016
Use cluster.split on MiSeq data Commands in mothur	15	13895	May 9, 2013
Cluster.split-trouble Commands in mothur	2	2033	March 13, 2015

cluster.split(processors=32) regenerating large *temp?

Related topics