Dist file from split.cluster

Hi,

I have a large dataset, so I used the cluster.split command in two steps:

  1. First Step:

cluster.split(fasta=.fasta, count=.count_table, taxonomy=final.taxonomy, taxlevel=4, cluster=f)

Output File Names: Finding singletons (ignore ‘Removing group’ messages):

  • mydata.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.pick.file
  • mydata.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.pick.0.disttemp
  • mydata.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.0.count.temp

Also, note that these temporary files are not in my working directory and not in my temporary files.

  1. Second Step:

cluster.split(file=mydata.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.pick.file, count=.count_table)
It was using the temp dist files that created in the previous step and succesfully did the clustering and create the output.
Output file is opti_mcc.list

The problem is, I did not get the dist file from the first step. I only had temp dist files and did not merge those files to create the final dist file. I also noticed those files disappered.

This is a warning that I received in my cluster run (second step)

It took 550020 seconds to cluster. Merging the clustered files… It took 53 seconds to merge.
[WARNING]: Cannot run sens.spec analysis without a column file; skipping.
Output File Names:
mydata.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.pick.opti_mcc.list

How can I get the distance file?

Thanks

You would need to get a distance matrix by using dist.seqs. The problem with cluster.split is that you lose any of the distances between taxa and there is no guarantee that the distance between taxa is less than your threshold.

Pat