normalize.shared removes OTUs?

cfriedline · July 7, 2011, 7:24pm

Hi,

I’m using 1.20.1 on OSX and saw this today. Before I normalize, my shared file (at 0.03) contains 3426 OTUs. After normalizing, it contains 3397. Seems to me that the total number of OTUs would not change. Is this a bug or am I missing something?

Before:
0.03 S0001 3426 0 0…

After:
0.03 S0001 3397 0 0…

Thanks,
Chris

cfriedline · July 7, 2011, 7:50pm

Digging through the source, I found this function in the implementation of normalize shared:

NormalizeSharedCommand::eliminateZeroOTUS

However, if I look at my shared file, I do not have any OTUs that are not represented by at least one sample.

cfriedline · July 7, 2011, 8:35pm

Also, this doesn’t happen with the z-score normalization

pschloss · July 8, 2011, 1:24pm

In normalize.shared we find the size of the smallest sequence collection or take the number you give us. We then remove any group that has fewer sequences than that number. With the remaining sequences, we find the relative abundance of every OTU in every group and then multiple that relative abundance by the normalization number. Then we round everything. What you’re probably seeing is that there are a few singleton OTUs that are getting normalized down to zero.

In contrast, the z-score normalization scales everything to have a mean relative abundance of zero and a standard deviation of 1.

Hope this helps,
Pat

cfriedline · July 8, 2011, 3:24pm

Thanks. Just to confirm, the process is as follows. I only ask, because I’m doing this on my own and getting some different results.

Let’s say that the sample with the smallest reads has 1,000 reads in it. For OTU A, Sample 1 has 100 reads in that OTU and has a total read count of 10,000. Upon normalizing using totalgroup, Sample 1’s OTU A read count becomes: (100/10,000) * 1,000 = 10.
If for some reason the normalized count drops below 0.5, then it’s rounded down to zero, checked by some process, and if all the sample counts for that OTU have counts of zero, that OTU is eliminated.

Is that right?

Thanks,
Chris

cfriedline · July 8, 2011, 4:56pm

I’ve posted a sample file so you can see what I mean here: http://db.tt/8pxhO2N. Mothur is dropping 29 OTUs because of the zeros across all samples, but I’m dropping 555. You can see that things match up pretty well at the beginning, but get kind of screwy later in the file.

Also, mothur seems to be shifting left when OTUs are deleted, and therefore losing original OTU indices. Won’t this cause a discrepancy between the list and shared files downstream?

I’d be happy to share code if you want to see how this is going on my end.

westcott · July 15, 2011, 1:21pm

Hi Chris,
Could you send the original shared file to mothur.bugs@gmail.com?
Thanks,
Sarah

cfriedline · July 15, 2011, 6:34pm

Done.

westcott · July 18, 2011, 3:28pm

I have looked in the normalize.shared bug you found. You are correct, mothur should have been dropping 555 Otus. The fix will be part of mothur 1.21.0, releasing next month. We will also add a change to keep the original Otu numbers, so that it will be easier to track an Otu in downstream analysis. Thanks for bringing this bug to our attention.

cfriedline · July 18, 2011, 6:25pm

Thanks!

Topic		Replies	Views
normalize.shared and beta diversity calcs Theory behind mothur	1	4332	May 4, 2011
sub.sample for use with the classify.otu command? Commands in mothur	1	2766	October 4, 2011
sub.sample Commands in mothur	8	12779	April 12, 2012
classify.otu with normalised data Commands in mothur	17	17642	September 10, 2014
Getting random seqs Commands in mothur	10	13567	March 2, 2011

normalize.shared removes OTUs?

Related topics