I have a set of 96 samples (mouse fecal, but don’t know what the client was doing to them). V4, 2x250 run, totally standard. After following the normal sop -precluster (diffs=2), chimera checking, etc-I’m left with ~150k seqs which is a little high for mouse but not as high as the bioblitz samples. However, I’m still stuck with a giant dist file (cluster.split taxlevel=5, cutoff=0.15). 73k seqs are all in one dist so it’s 200gb. I checked 10 or so of the seqs and they are all id’d to
Bacteria(100);Bacteroidetes(100);Bacteroidia(100);Bacteroidales(100);S24-7(100);unclassified;
so going down to taxlevel=6 won’t help. The server has 512gb ram so I guess I could just drop the processors for actually clustering down to 1 and wait but I don’t want to hog the shared resource if there’s something else to try.