cluster.split

umjdf · August 23, 2011, 3:07pm

I’m trying to run cluster.split but my dist file is 112 GB and the command was running for over 96 hours before I cancelled it (I didn’t think it should be running that long). I started preprocessing with over 350 000 sequences and I am down to just over 100 000. Is there something I’m missing? I’ve looked at other posts and apparently a 30 GB dist file is massive… (Sorry, very new to this).

pschloss · August 25, 2011, 8:19pm

Yeah, there’s no reason you should be working with 350,000 sequences as input. Have you looked at the Costello Analysis Example? It’s important to get the error correction correct because the sequencer basically acts like a random sequence generator. This makes downstream processing very difficult (as you have found) and artificially inflates the biodiversity of your samples.

Pat

Topic		Replies	Views
cluster.split Commands in mothur	1	1429	January 16, 2015
Use cluster.split on MiSeq data Commands in mothur	15	13898	May 9, 2013
Problems when using cluster.split on huge .dist file Commands in mothur	2	1423	August 30, 2019
Computer Issues with hcluster Commands in mothur	2	2958	May 24, 2011
cluster(method=average) Commands in mothur	8	6732	December 10, 2010

cluster.split

Related topics