Statistical question - normalized v. absolute reads (SMRT)

Dear Pat and the mothur community,

I have been developing a method to do absolute quantification of 18S amplicons using the SMRT cell platform. I have managed to get it to work… but face a conceptual problem in my analysis that I cannot get my head around.

I am working with a dataset that has most of the reads present in the top 30 OTUs. Following standard protocols, I normalized my dataset to the library with the lowest number of sequences and before I did any tests, I noted that my third most abundant OTU (0003) was most abundant at, let’s call it location X. However, when I calculated absolute reads of 18S per mL of seawater from the non-normzalized dataset (because it is absolute quantification) I noted that suddenly there were more sequences per mL of seawater at locations Y and Z. This is because of various reasons that include the number of wells occupied in any SMRT cell run and the success of the sequencing… although this did not affect the efficiency of my quantification using my standards, which still came out with high accuracy. It meant that stations Y and Z had a higher conversion factor from reads to sequences than X.

With that in mind, I went back to the normalized OTU read dataset and did a UniFrac and saw that a completely different station (let’s call it location A) was significantly different than the X, Y and Z, which were in fact “the same”. I went ahead and plotted out the significant drivers using the Pearson method (all following the 454 SOP). The significant drivers of my trend was not OTU 0003, in fact it was quite a lot of things that were not in the top 30 OTUs. To make it simple, most of my top 30 OTUs are diatoms but the drivers include ciliates and other protozoan grazers in my top, say, 50 or so OTUs that make station A different than all the other locations.

To get to the root of the question. I have not checked whether normalized reads v actual sequences per mL of seawater varies the dominance of most OTUs in my sample. But the question I have is… knowing that it can change the prevalence of OTUs… is it still OK for me to use normalized reads even when the sequencing success has differed between SMRT cells for each library?

I’m in something of a conundrum and must admit that stats is not necessary my forte but welcome your thoughts on this matter.

All the best,

Bethan

Hi Bethan,

You’re trying to convert relative abundance to absolute abundance - right? I think you’re in a unique position of being able to do such an analysis since so many people studying the gut, soil, etc. are forced to just use relative abundance since we can’t get an absolute number of cells. I guess that I would still subsample everything to a common number of reads, and then multiply by the number of cells. I would present the paired analysis using both the relative and absolute abundances.

Hope this helps…
Pat

Hi Pat,

Yes, that is correct… and thank you for your input! Naive question but is there any way to force mothur to do the NMDS using a file with the absolute quantification? Does it only use the matrix provided?

Bethan

NMDS takes in a distance matrix. So you can calculate a distance matrix however you want and then fee that into the nmds command.

This is awesome, thanks Pat!