Cluster.split problem

I am trying to troubleshoot a cluster.split error.

I have run a Miseq set through pre.cluster and then chimera.uchime. The former provided input for the latter, that is, the fasta and count_table files. After chimera.uchime I used remove.seqs with the .accnos file. I then went to pre.cluster and used as input a file labeled:

and a count file labeled

cluster.split seemed to chug along fine. It created 157 .dist files in parsing the data set along with the associated count_tables.

Each of the .dist files contains a list of pairs of sequence names and distances. Most of the count_table.XX.temp files have disappeared. Of those that remain, most contain seq names and counts, but some are empty, no rep seq, no counts.

It’s possible this is okay, however, Mothur is in what looks like a long term mode of spitting out an ERROR message, which is rapidly rolling by the screen and of course filling the buffer:

[ERROR]: your count table contains more than 1 sequenced named , sequence names must be unique. Please correct.

Any helpful hints on the source of this problem would be much appreciated. Is it more likely a problem with the temp files or is it more likely that it is a problem with the .uchime.pick.count_table? or the fasta?

At the moment, I can let Mothur run a while to see how it concludes, if it does… but it seems as though it is in a loop and needs to be terminated… and restarted…



reading a previous post suggests that the problem might be the size of the distance matrix created by cluster.split. It is large. Very. It would be good to know if this can create problems in completing the cluster.split run. The error reported doesn’t provide a hint. Also, the log file shows that mothur couldn’t open a handful of the .temp files it created. A few of the latter turned out to be empty, but others were not.

Stuck with a phylotype analysis for this???