I have a large data set (234 samples), and when I run cluster.split, the program is making the temp distribution files, where each file is taking up 800 GB or more, which is causign the program to run out of hard-drive space.
- Is this normal?
- Is this the result of too many sequences involved?
- What are strategies to reduce the filesize (and presumably run time)?
cluster.split(fasta=current, name=current, taxonomy=current, splitmethod=classify)