[cluster.split] : temp directory and groupfile/listfile mismatch

Hello,

I’m running cluster.split(splitmethod=classify, taxlevel=3, cluster=f, method=opti, cutoff=0.05, column=final.dist, count=final.count_table, taxonomy=final.taxonomy) with mothur/1.44.3 on a Linux cluster, and I’m having 2 issues :

  1. I want to redirect the temporary matrices created (e.g. .dist.1.temp) in another existing directory with the use of set.dir(output=/scratch/users/vgilbart, tempdefault=/scratch/users/vgilbart/temp), but this does not seem to work. It continues to create the .dist.00.temp in the same folder as the original matrix (same thing for the .count_table, although this is not my main issue). It only outputs the .file in the directory I want.
    Is there anyway I can change that or is it mandatory for the temp matrices to be written in the same folder as the original one?

  2. I still carried on with the clustering by running cluster.split(file=final.file, method=opti, cutoff=0.05) which had the following warning :

[WARNING]: Cannot run sens.spec analysis without a phylip or column file, skipping.
Output File Names: 
final.opti_mcc.list

And then, I encountered an error while running make.shared(list=final.opti_mcc.list, count=final.count_table) :

[ERROR]: SOME_SEQUENCE is in your groupfile and not your listfile. Please correct.
Your group file contains 229716 sequences and list file contains 229712 sequences. Please correct.

I have seen this type of error happening to other in the forum, mostly due to mistakes in generating the .count or .group file before the clustering. But I have ran make.shared() after another clustering method (agc and dgc, although I know the mothur team does not recommand these methods), with the same .count_table file without any issue.
So I can only suppose the problem is due to sequences missing in the .list file created by the cluster.split() command? Or do you have another explanation as for why the make.shared() command works with a .list file created by cluster(method=agc) but not by one created by the cluster.split(method=opti, ...) command?

Thank you very much for your effort in documentation and this forum, it has already helped me a lot in better understanding mothur!
Best,
Valentine

" Is there anyway I can change that or is it mandatory for the temp matrices to be written in the same folder as the original one ?"

When you split the matrix using the complete matrix mothur creates the temp dist files in the location of the original matrix. Instead of providing the complete matrix, you can allow mothur to calculate the distances, which allows you to use the outputdir parameter to set the location of the temp dist files. Calculating the matrices this way also takes less time, because you are only finding distances between sequences in the same taxonomic group.

Instead of:

cluster.split(splitmethod=classify, taxlevel=3, cluster=f, method=opti, cutoff=0.05, column=final.dist, count=final.count_table, taxonomy=final.taxonomy)

try:

cluster.split(splitmethod=classify, taxlevel=3, cluster=f, method=opti, cutoff=0.05, fasta=final.fasta, count=final.count_table, taxonomy=final.taxonomy, outputdir=/scratch/users/vgilbart/temp)

The error you are getting from the make.shared command indicates 4 reads are missing from the list file. If you send your fasta, count and taxonomy files to mothur.bugs@gmail.com, I can track down the issue for you.

1 Like

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.