I have 25 cultured soil samples sequenced at the V3-V4 region that I am trying to get to get through the cluster splitting. I am also very new to any type of sequencing analysis any specific help that can be given will be much apprecieated.
I have read why V3-V4 is not a good idea and the Why do I have such a large distance matrix article and have some specific questions about how I can make the most out of the situation I’m in.
Can I get this data through cluster.split and how?
I was able to make the dist file with a cutoff at 0.1, and the resulting file is 125gb. Is there any way that I can get this through cluster.split? And if so what is the best way to do that? I have access to a supercomputing institute with many different partitions to choose from with some of the highest powered being a GPU computer with 128 cores, 1000gb, and a job time limit of 24 hours - or a cpu computer with with 128 cores, 2000gb, and a job time limit of 96 hours. If it’s possible to get this data through cluster.split what would be the computing parameters (how many cores, ram, etc) as well as the specific commands/parameters in cluster split (opticlust vs average, cutoffs, taxlevel, etc.) that would give me the highest liklihood of success?
Would it be better to just use the phylotype based approach?
I have read this this may be more feasible with the circumstance I’m in. If this approach is better, then what would be the best way to implement it? I understand that I need to redo the classify.seqs command before this, but will this approach affect how I excecute the downstream commands such as normalization, alpha and beta diversity, etc?