Cluster command missing "unique" in output

Good day
I just ran the cluster command using the output from my pairwise.seqs command. Here is the command that I used:

cluster(column=final.dist, count=final.count_table, method=nearest, cutoff=0.03)

And then I get the following output:
final.nn.list

The annotated workflow that I’m following states that my output should be: “final.nn.unique_list.” instead. This is a problem, because when I opened the large .cons file containing all my OTU’s and samples, it seems like there is a massive overlap in taxonomy. If it were “unique”, I’m assuming all the taxa names would have been unique. I am a bit lost on how to fix this issue.

Many thanks in advance!

Best
Nicolas

To get the unique level, you’d either need to follow the instructions in the MiSeq SOP for generating ASVs (best option). or give the cluster command cutoff=0.00. I’m also not sure what a good argument would be for using nearest neighbor clustering. That tends to form chains where sequences can be clumped together that are far apart.

Pat

Hi Patt

Thanks for getting back to me. I think I should’ve phrased my question differently. I’m actually fine with the outcome of my OTU table, but what I’m mainly trying to figure out is how to collapse the redundant OTU’s in my OTU table. From my cluster command, I am trying to get the “unique” criteria somehow in the output filename. Is there any specific function or criteria in the cluster command that would enable this?

Again, thanks for all the feedback. Your tips on the precluster problem a week ago helped a lot.

Cheers

Sorry I don’t understand what you’re trying for. The output of unique.seqs/ pre.cluster would be the unique’d data. There shouldn’t be any redundant OTUs in the shared file.

Pat

Hi Pat
Sorry for the confusion, but nonetheless thanks for your responses. I did run unique.seqs and pre.cluster in previous steps and used those outputs for subsequent steps. It just seems unlikely that there can be 11 000 different fungal OTU’s in my dataset. If the taxonomic assignments for two different rows are the same, with the same size (this is from my final.nn.0.03.pick.0.03.cons table) could they still be unique based on some other parameter?

We generally expect multiple OTUs to have the same taxonomy since they may represent a taxonomic resolution that is more fine than a genus. I would expect an ASV/ESV, to be even more fine scale, further increasing the likelihood of multiple sequences with the same taxonmy

Dear Pat
Again, thanks for all of this invaluable information. I did check it with ASV’s in addition to OTU’s and yes you’re right, it did give it an even finer scale. I guess I was not aware that the sequences could be unique given the same taxonomy names, but knowing that now helps me a lot.

Cheers.