I’m trying to analyse 16S rRNA gene sequence data of 46 wine grapes and wine samples (92 sequences),
In the pipeline I use there is the following command: cluster(column=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.dist, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.pick.count_table)
It has already worked in other samples (olives and soil) but in the dataset I’m running now It doesn’t !
Τhe problem is that it gets stuck on this command and even if I leave the program open for hours it doesn’t work,
What is the result of summary.seqs just prior to making the distance matrix (aka how many uniques do you have?)? Also, on what kind of hardware are you running your analysis?
From the length of your assembeled sequences (400 to 542 nt), it looks like you’re sequencing a non-V4 region and so your reads do not fully overlap with each other. I suspect you’re also using the 2x300 nt chemistry to sequence the data, which causes additional problems. Both of these issues will cause you to have a much higher than desired error rate and inflating the number of unique reads you have. I’d encourage you to read this blog post for a better idea of what’s going on and what can be done about it.