I have come to understand the controversy among V3 versus V2 chemistry and its generation of nasty large distance matrices (mine is 1.8T)… I’ve read the issues with this but size and memory are not quite the issue as I’m working on a high performance cluster. I’m using Illumina paired end bacterial variable region 4 data; more or less following the Mothur MiSeq SOP mixed with this example: https://www.abdn.ac.uk/genomics/documents/Mothor_training_guide.pdf
My problem point is running cluster.split (as #cluster is not parallel)
However as I’ve watched the .dist..temp file grow in size, it seemed to delete itself and start rebuilding again (somewhere around 1TB it started making a new disttemp). Is mothur working as it should? Other dist..temp files are being created, but they’re only in the kilobyte range. I’m running the most current version, 1.37.5
I tried my full set of commands in a script that worked with one sample, then I tried two, then about 6, all using #cluster. Now that it’s scaled it up to 96 samples… I thought it useful to go with #cluster.split. Am I missing something- or just being impatient?
Thanks for reading!