Hello all
Here’s an issue that cropped up in a review of a manuscript of ours at a journal. This is my first microbiome-related submitted manuscript. Without getting into journal-specific issues, a reviewer made a couple comments about our work using MiSeq and MOTHUR that I thought I would bring here. Our manuscript looks at the lung microbiome in a cohort of control subjects and subjects with asthma, a situation in which we have a low biomass for each sample – unavoidable, it is what it is. We extracted our DNA, did V4-region PCR amplification, then MiSeq, and then used MOTHUR’s MiSeq pipeline (per the FAQ) to get us to the point of data analysis. Our friendly reviewer had three vehement issues for us, and I ask for a little advice here (sorry for the length) –
Issue #1: The reviewer notes an issue that was brought up in a paper published in November by Salter (1). Our study has, as part of the dataset, some negative reagent-only controls (DNA extraction kits, etc). As Salter notes, while these are thought to be “sterile”, in reality there are some sequences in these solutions that might confound an analysis. Indeed, for some (not all) of our reagent controls, while the amount/concentration of DNA submitted for analysis generally was lower (per nanodrop) than for our patient samples, it’s hard to tell the difference between the nseqs and sobs of these controls versus at least some of the patient samples. I’d like to hear what others are doing with low biomass situations with regard to reagent controls: does one “subtract out” these sequences from one’s samples, or are there other ways to handle this? One of my colleagues who is looking at the issues says that it’s quite “complex”, to which I agree, but that doesn’t help me deal with this reviewer.
Issue #2: The reviewer notes that in 454 pyrosequencing, ultra low DNA levels can be sequenced but at a much reduced efficiency (i.e. # of reads) and random sequencing artifacts are reduced. His point (I think) is that I wouldn’t have this issue if I had only done 454, since most of the ‘sequences’ in my reagents controls (and perhaps in some of my samples) simply wouldn’t have been picked up. I’m not willing to go to an older technology (one that our university is actually phasing out); MiSeq, etc is the current wave. So how best to respond to the objection “if only you had used 454 you wouldn’t have this problem?”
Issue #3: in part because MiSeq is more sensitive than 454, we identified 24 phyla in our patient samples. Per the phylum taxonomy table in MOTHUR, the most numerous 6 phyla (Firmicutes, Proteobacteria, Bacteroidetes, Fusobacteria, Actinobacteria and ‘unclassified’) accounted for ~99% of the total nseqs (‘size’). Proteobacteria, the most numerous, has a size of 669907 (38% of total counts), phylum #6, Fusobacteria, has a size of 51040, whereas phylum #7 in rank, Acidobacteria, has a size of 5988. Half of the phyla have counts < 1000 and the bottom 6 have a summed count of < 400. You can see the issue. In the manuscript we noted this and focused (we thought appropriately) on the top 6. Our reviewer had a major disagreement (I’m being polite) and said that the fact that we identified TWENTY-FOUR PHYLA!!! meant that our data was total crap. We had a similar issue with taxonomy at each level down to genera: we identified sequences belonging to 605 genera, but only the top 16 had a size that for each was > 1% of the total, and only five were > 5%. Question: how do others here handle rare, sparse counts of phyla, genera, OTUs, etc? This might be a reporting issue more than anything – may I draw a line (somewhere) and say that anything below the line doesn’t need even to be reported?
All thoughts greatly appreciated.
[1] Reagent and laboratory contamination can critically impact sequence-based microbiome analyses. Salter SJ, Cox MJ, Turek EM, Calus ST, Cookson WO, Moffatt MF, Turner P, Parkhill J, Loman NJ, Walker AW. BMC Biol. 2014 Nov 12;12:87.