Possible to Combine Cluster.Split Output Files

cgivens · April 6, 2020, 9:05pm

A bit of an unorthodox question, but curious if there’s a work-around.

I am running cluster.split with commands identical to these (with my file names substituted).

cluster.split(fasta=final.fasta, name=final.names, taxonomy=final.taxonomy, taxlevel=4, cluster=f, processors=8)
cluster.split(file=final.file, processors=2)

I was able to generate the distance matrix (it’s huge, but I got it) and the Files file. However, I am running into RAM issues on the second cluster.split command. It generates about 75% of the .opti_mcc.list files before having issues. It seems that I can pass the Files file through multiple times with different samples included so I could potentially generate all the .opti.mcc.list files for all my samples. My question is how I could then generate the “stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.pick.opti_mcc.list” that you use as an input into make.shared. Is that possible? It seems like there is some command at the end of cluster.split that concatenates all the opti_mcc.list files for each sample into a grouped opti_mcc.list and I am trying to figure out how to do this once I have generated all the individual files through separate runs of the second cluster.split command.

Kendra · April 8, 2020, 7:26pm

Why not use dist.seqs with cutoff=0.03 and opticlust?

cgivens · April 8, 2020, 7:45pm

In a previous thread, the suggestion was to split the cluster.split commands for processing. I just can’t seem to get it to process the second cluster.split command together - I can get it to run piecemeal - generating the individual sample .opti_mcc.list files.

pschloss · April 9, 2020, 3:15pm

You would have to do it manually or write a script to concatenate the individual list files.

cgivens · April 9, 2020, 3:45pm

Is there a way to concatenate the files in mothur? I looked at a few options but none of them seemed appropriate.

pschloss · April 9, 2020, 4:49pm

Sorry but there isn’t. The commands assume that you run them all the way through with cluster.split.

Kendra · April 9, 2020, 6:13pm

you likely need to drop your processors. you need more ram than your largest dist file x number of processors requested. so if you have a 50Gb dist and you want to use 8 processors, you need 400Gb ram.

Topic		Replies	Views
Problem with cluster.split and opti_mcc.list files	1	538	August 2, 2022
Error with cluster.split - cannot open opti.mcc.list Commands in mothur	2	692	July 22, 2019
Output of cluster vs cluster.split Commands in mothur	1	711	May 22, 2017
Problem in Cluster.split Commands in mothur	1	1086	April 28, 2016
I split the distance file first. How to cluster now? Feature requests	3	3435	August 26, 2013

Possible to Combine Cluster.Split Output Files

Related topics