I’m using 1.20.1 on OSX and saw this today. Before I normalize, my shared file (at 0.03) contains 3426 OTUs. After normalizing, it contains 3397. Seems to me that the total number of OTUs would not change. Is this a bug or am I missing something?
0.03 S0001 3426 0 0…
0.03 S0001 3397 0 0…
Digging through the source, I found this function in the implementation of normalize shared:
However, if I look at my shared file, I do not have any OTUs that are not represented by at least one sample.
Also, this doesn’t happen with the z-score normalization
In normalize.shared we find the size of the smallest sequence collection or take the number you give us. We then remove any group that has fewer sequences than that number. With the remaining sequences, we find the relative abundance of every OTU in every group and then multiple that relative abundance by the normalization number. Then we round everything. What you’re probably seeing is that there are a few singleton OTUs that are getting normalized down to zero.
In contrast, the z-score normalization scales everything to have a mean relative abundance of zero and a standard deviation of 1.
Hope this helps,
Thanks. Just to confirm, the process is as follows. I only ask, because I’m doing this on my own and getting some different results.
Let’s say that the sample with the smallest reads has 1,000 reads in it. For OTU A, Sample 1 has 100 reads in that OTU and has a total read count of 10,000. Upon normalizing using totalgroup, Sample 1’s OTU A read count becomes: (100/10,000) * 1,000 = 10.
If for some reason the normalized count drops below 0.5, then it’s rounded down to zero, checked by some process, and if all the sample counts for that OTU have counts of zero, that OTU is eliminated.
Is that right?
I’ve posted a sample file so you can see what I mean here: http://db.tt/8pxhO2N. Mothur is dropping 29 OTUs because of the zeros across all samples, but I’m dropping 555. You can see that things match up pretty well at the beginning, but get kind of screwy later in the file.
Also, mothur seems to be shifting left when OTUs are deleted, and therefore losing original OTU indices. Won’t this cause a discrepancy between the list and shared files downstream?
I’d be happy to share code if you want to see how this is going on my end.
Could you send the original shared file to email@example.com?
I have looked in the normalize.shared bug you found. You are correct, mothur should have been dropping 555 Otus. The fix will be part of mothur 1.21.0, releasing next month. We will also add a change to keep the original Otu numbers, so that it will be easier to track an Otu in downstream analysis. Thanks for bringing this bug to our attention.