Cluster command issue

Hello everyone,

I’m trying to analyse 16S rRNA gene sequence data of 46 wine grapes and wine samples (92 sequences),

In the pipeline I use there is the following command:
cluster(column=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.dist, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.pick.count_table)

It has already worked in other samples (olives and soil) but in the dataset I’m running now It doesn’t !

Τhe problem is that it gets stuck on this command and even if I leave the program open for hours it doesn’t work,

Any ideas??

Thank you


What is the result of summary.seqs just prior to making the distance matrix (aka how many uniques do you have?)? Also, on what kind of hardware are you running your analysis?



These are the results of summary.seqs, so I have 204149 uniques,

What exactly do you want to learn about the hardware I’m using?

Thank you once again!

Morning. it is simply to see if you have enough memory or processing power for the number of unique sequences that you have.

CPU: i7 9700 (3 GHz)
Ram: 16GB

I was tried to run a smaller part of the dataset and it worked!

So do you think is the memory not enough??

From the length of your assembeled sequences (400 to 542 nt), it looks like you’re sequencing a non-V4 region and so your reads do not fully overlap with each other. I suspect you’re also using the 2x300 nt chemistry to sequence the data, which causes additional problems. Both of these issues will cause you to have a much higher than desired error rate and inflating the number of unique reads you have. I’d encourage you to read this blog post for a better idea of what’s going on and what can be done about it.


This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.