If you have a dataset that isn’t working using the standard SOP (precluster 1%, clustersplit 4/5) because of dist size issues. Would it be better to increase pre.cluster to 1.5% or cluster.split taxalevel=6?
I’m reclustering my huge and diverse Bioblitz data (soon it will all be public and analyses on github!). When I was initially clustering I had to drop cluster.split taxalevel to 6 to try to get the biggest dist below 200GB. This time I decided to increase preclustering to diffs=3, cluster.split taxalevel=4 is working (largest dist matrix is ~35gb). Which leads me to thing that I’m better off (at least for this dataset which is likely unclassified heavy) to increase pre.clustering. What do others do? how to decide which part of the SOP to tweak when you run into computational limits (I have 256gb RAM so it’s a pretty big limit)