I tried to first run cluster.split and when it crashed, I tried dist.seq+cluster on 192 files (dist.seqs file is 1.7 Tb… is it normal?), which also crashed.
I notice that during the process, all the ram was used (32 gig) and also the swap ( 8 gig, on 256 gig SSD disk). Do you think I should reinstall my system with more swap? Or the 1.7 Tb file is not normal?
I tried increasing swap with an SSD and it still didn’t work. You’ll need to use cluster.split because to cluster a 1.7TB file, you’d need 1.7TB ram. You can look at your temp .dist files to see if it’s going to work-your biggest (.dist * # of processors) needs to me smaller than your RAM. With 32GB ram, you’ll probably want to only use one processor for the cluster.split command.
Thanks for your reply!
Indeed, it still crashed with for swap and I proceed with scp.
Sounds like we are having similar problems. I think we have too many sequences, probably inflated by errors. I had a 1.5Tb distance file
mick-v2 2x250? I’ve had datasets where I needed to up pre.cluster(diffs=3), and cluster.split(tax=5)