Cluster command missing "unique" in output

Nicolas · August 7, 2020, 8:17pm

Good day
I just ran the cluster command using the output from my pairwise.seqs command. Here is the command that I used:

cluster(column=final.dist, count=final.count_table, method=nearest, cutoff=0.03)

And then I get the following output:
final.nn.list

The annotated workflow that I’m following states that my output should be: “final.nn.unique_list.” instead. This is a problem, because when I opened the large .cons file containing all my OTU’s and samples, it seems like there is a massive overlap in taxonomy. If it were “unique”, I’m assuming all the taxa names would have been unique. I am a bit lost on how to fix this issue.

Many thanks in advance!

Best
Nicolas

pschloss · August 10, 2020, 4:39pm

To get the unique level, you’d either need to follow the instructions in the MiSeq SOP for generating ASVs (best option). or give the cluster command cutoff=0.00. I’m also not sure what a good argument would be for using nearest neighbor clustering. That tends to form chains where sequences can be clumped together that are far apart.

Pat

Nicolas · August 10, 2020, 7:02pm

Hi Patt

Thanks for getting back to me. I think I should’ve phrased my question differently. I’m actually fine with the outcome of my OTU table, but what I’m mainly trying to figure out is how to collapse the redundant OTU’s in my OTU table. From my cluster command, I am trying to get the “unique” criteria somehow in the output filename. Is there any specific function or criteria in the cluster command that would enable this?

Again, thanks for all the feedback. Your tips on the precluster problem a week ago helped a lot.

Cheers

pschloss · August 10, 2020, 7:47pm

Sorry I don’t understand what you’re trying for. The output of unique.seqs/ pre.cluster would be the unique’d data. There shouldn’t be any redundant OTUs in the shared file.

Pat

Nicolas · August 13, 2020, 8:50pm

Hi Pat
Sorry for the confusion, but nonetheless thanks for your responses. I did run unique.seqs and pre.cluster in previous steps and used those outputs for subsequent steps. It just seems unlikely that there can be 11 000 different fungal OTU’s in my dataset. If the taxonomic assignments for two different rows are the same, with the same size (this is from my final.nn.0.03.pick.0.03.cons table) could they still be unique based on some other parameter?

pschloss · August 17, 2020, 5:21pm

We generally expect multiple OTUs to have the same taxonomy since they may represent a taxonomic resolution that is more fine than a genus. I would expect an ASV/ESV, to be even more fine scale, further increasing the likelihood of multiple sequences with the same taxonmy

Nicolas · August 18, 2020, 1:51am

Dear Pat
Again, thanks for all of this invaluable information. I did check it with ASV’s in addition to OTU’s and yes you’re right, it did give it an even finer scale. I guess I was not aware that the sequences could be unique given the same taxonomy names, but knowing that now helps me a lot.

Cheers.

Topic		Replies	Views
eye rolling "unique" question Theory behind mothur	5	6649	September 2, 2014
Cluster and the output files Commands in mothur	2	1335	March 5, 2015
OTU clusters Commands in mothur	1	1969	August 29, 2013
Order matters In cluster() for OTU?? mothur bugs	2	3322	January 2, 2014
Cluster command issue Commands in mothur	6	483	December 10, 2021

Cluster command missing "unique" in output

Related topics