eye rolling "unique" question

jontarn · April 21, 2014, 7:53pm

Hi;

I know this has probably been reference many times, and I tried to read through the faq but it still made no sense to me;

unique means each sequence in an OTU is identical, correct? So why is it that when I cluster and it hits to unique, I still end up with fewer sequences than my unique.filter.pick fasta, and some sequences still get clustered into OTUs. Why is that?

I tried looking for the answer onth e forums, though not admittedly very hard, and came up short.

thanks as always.

pschloss · April 22, 2014, 2:27pm

Because it looks like you ran unique.seqs before filter.seqs. After filter.seqs, run unique.seqs (and pre.cluster) and you should be good to go.

jontarn · April 24, 2014, 9:37pm

that makes perfect sense thanks so much.

BMott · August 20, 2014, 8:52pm

Hello,
I have a related question about the cluster() command’s output. I’m seeing different “unique” OTU counts in my *.rabund files depending on the method options used.

My “method=furthest” and “=average” outputs indicate 6,391 unique-distance OTUs, but “=nearest” indicates only 3,426 unique-distance OTUs.

I thought the clustering method should only impact the OTU binning at higher thresholds (0.01, 0.02, etc.) because the “unique” bins only contained identical sequences?

Could this be a calculation error, or have I misunderstood something?

I’ve read about the distinction between “unique” and “0.00” (and the fact that 0.0049 would be rounded down to 0.00), but I’m not seeing any “0.00” level OTU counts in my *.rabund files anyway.

Background info:
I’m basically following the 454 SOP, but I have ITS data that can’t be aligned (as far as I know), so I’ve skipped the align.seqs() and pre.cluster() steps.
I used the unique.seqs() command (count= 6,893).
I used the pairwise.seqs() command as an alternative to dist.seqs().

I’m also curious about the discrepancy between my unique.seqs count (6893) and the unique OTU counts (6391) after running cluster(), but I’m assuming this is possible because I calculated distances with “countends=F”. In this case, I think two sequences of differing length might be counted as distinct via the unique.seqs() command and yet produce a distance of 0.000 across their aligned space.

Thanks for any info you can provide.

pschloss · August 22, 2014, 8:30pm

A couple of years ago we switched to a hard cutoff so 0.00 = unique and 0.0049 would possibly be the 0.01 cluster (assuming there was nothing between 0.0050 and 0.01. So I’m not sure why the different methods would give you different results - unless you’re using a very old version of mothur. When you look at the top of your mothur session, what version does it say you’re using?

Pat

BMott · September 2, 2014, 8:58pm

Thanks for the reply.

I was using a slightly outdated version.

However, I have just repeated the cluster(method=nearest) and cluster(method=average) runs with the newest Windows version (v.1.33.3, 64bit), and I’m seeing the exact same results.

In any case, I’m happy with the method=average OTU data, so this is mostly an academic (or bug reporting) question.

I’d be happy to upload the distance table if that’s helpful, but I think it’s ~800mb. (I ran pairwise.seqs() with “cutoff=1.0” because my first attempt with “cutoff=0.10” was running into the known issue where cluster(method=average) provides only unique-level OTUs).

Topic		Replies	Views
Order matters In cluster() for OTU?? mothur bugs	2	3321	January 2, 2014
Cluster command missing "unique" in output	6	560	August 18, 2020
Average Clustering of ~10k unique V6 sequences Commands in mothur	10	8669	May 27, 2011
OTU clusters Commands in mothur	1	1969	August 29, 2013
OTUs & unique.seqs Theory behind mothur	2	3659	September 25, 2013

eye rolling "unique" question

Related topics