giant dist file-but this one is different!

I have a set of 96 samples (mouse fecal, but don’t know what the client was doing to them). V4, 2x250 run, totally standard. After following the normal sop -precluster (diffs=2), chimera checking, etc-I’m left with ~150k seqs which is a little high for mouse but not as high as the bioblitz samples. However, I’m still stuck with a giant dist file (cluster.split taxlevel=5, cutoff=0.15). 73k seqs are all in one dist so it’s 200gb. I checked 10 or so of the seqs and they are all id’d to

Bacteria(100);Bacteroidetes(100);Bacteroidia(100);Bacteroidales(100);S24-7(100);unclassified;

so going down to taxlevel=6 won’t help. The server has 512gb ram so I guess I could just drop the processors for actually clustering down to 1 and wait but I don’t want to hog the shared resource if there’s something else to try.

You might try diffs=3 in pre.cluster, but that’s about all I’ve got for you, for now.

Pat

ohh liking the sound of “for now”

:smiley: