Hi there,
I tried to run cluster.split on my MiSeq data, but was unable to complete it as the temporary files produced took up approximately 21TB. I just had a look at how many sequences I was trying to cluster (929,618 unique/6,416,820 total – which is down from approx 26 mil raw reads at the beginning of my analysis). Someone must have encountered this problem before, any insight as to what I can do?

Thanks in advance,

I think this will explain what’s going wrong and perhaps some paths forward…