Cluster error --

stevewhitemd · October 13, 2016, 4:49pm

Hi all

Running the MiSeq Mothur protocol (iMac, MacOS sierra, Mothur 1.38, oodles of disk space) on a set of samples (~80 sputum samples from patients with cystic fibrosis). Everything was going great until I hit the cluster command, for which I get this error message:

[ERROR]: HWI-HWI-M04771_47_000000000-AUJLE_1_1105_19621_21354 is not in your count table. Please correct.

I then tried the cluster.split command and had the exact same error message. Both commands appear to execute properly until the error message hits. The dist file is about 207.6 GB; the count table is about 19 MB.

So I’m at a loss as to what to do. Thoughts?

Thanks in advance!

stevewhitemd · October 13, 2016, 5:20pm

Let me add: the sequence noted in the error message indeed is not in the count table. Would it be as simple as adding it, along with values of ‘0’ for each of the samples?

stevewhitemd · October 13, 2016, 5:56pm

So I took my advice and added that sequence to the count table. I then re-ran cluster:

cluster(column=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.dist, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.pick.pick.count_table)

For which I got this error message:

[ERROR]: HWI-M04771_47_000000000AUJLE_1_1106_15430_14650 is not in your count table. Please correct.

So I could see doing this repeatedly, one at a time. The dist file is 207.6 GB so I sure can’t load that into a text editor. Thoughts?

stevewhitemd · October 13, 2016, 6:25pm

Another update: in inspecting the count table, after the header line each line of the table (>153000 lines in mine) looks like this:

HWI-M04771_47_000000000-AUJLE_1_1105_19621_21354 1 0 0 …

Where there is an identifier for the sequence followed by the total count (here, 1) and the count for each sample.

The error message I received was:

[ERROR]: HWI-HWI-M04771_47_000000000-AUJLE_1_1105_19621_21354 is not in your count table. Please correct.

Notice the doubled “HWI-HWI”? I checked the count table and there are no entries with “HWI-HWI”. So now I’m wondering if the issue isn’t a missing sequence but a bug somewhere.

Kendra · October 13, 2016, 9:21pm

search for that seq in your dist file, you could try deleting that seq from the dist

stevewhitemd · October 14, 2016, 1:32am

Ok. It’s a 207 GB file; it’s not going to fit in any text editor I have. Suggestions?

Also, am re-running cluster.split to see if perhaps I goofed up the first time.

Kendra · October 14, 2016, 2:31pm

I’d always suggest cluster.split for any next gen datasets.

you can sed for the offending sequence. something like this will remove the line where it’s found

sed -e ‘/F4Q4SKU0…/d’ fung.dist >fung1.dist

stevewhitemd · October 15, 2016, 1:16am

I’ll take a look.

One quick further question: how long does it take for cluster.split to work with very large dist files? It generated the various smaller dist and temp files fairly quickly but now has been sitting for quite a while with no (obvious) activity. How long do I give it?

Kendra · October 17, 2016, 1:43pm

unfortunately, you can’t know that till you’ve done it. Generally, when clustering >100 samples on 4 processors (my server has 512gb ram) it will take between 12hrs and 3 days for the whole SOP to run.

Violeta · October 17, 2016, 1:50pm

Hi, I have the same Problem I am running Cluster.split after followed the exact protocol from MiseqSop, the only differences is that I’m using 18S, This is the command:

cluster.split(fasta=889trim.contigs.good.unique.good.filter.precluster.pick.pick.fasta,count=889trim.contigs.good.unique.good.filter.precluster.denovo.uchime.pick.pick.count_table,taxonomy=889trim.contigs.good.unique.good.filter.precluster.pick.nr_v123.wang.pick.taxonomy, splitmethod=classify, taxlevel=4, cutoff=0.2, processors=8)

I have a total number of sequences of 2591884
And just 537348 unique sequences

After 3 days running, it stops without sending and Output file and the only ERROR I get is this;

[ERROR]: M03540_58_000000000-M03540_58_000000000-AK3J9_1_2107_9877_3118 is not in your count table. Please correct.

stevewhitemd · October 18, 2016, 1:14am

So cluster.split took just under 2 days (2 processors, 24 GB) for me. Perhaps in the future the command could have some sort of progress indicator to let the user know that it’s working and that Mothur hasn’t crashed. Just a thought.

Topic		Replies	Views
Cluster Error - sample not in your count table Commands in mothur	2	4911	December 28, 2015
cluster(column=xxx) & count table Commands in mothur	10	5966	July 20, 2015
cluster - "not in your count table" Commands in mothur	3	2112	February 17, 2016
Errors in cluster.split Commands in mothur	11	375	December 28, 2023
cluster and correcting count file Commands in mothur	4	2331	January 8, 2015

Cluster error --

Related topics