Low biomass sequencing depth

Hello, mothur community,

I have been working on a recent project that includes 16s V4 data from mouse lungs, provided by a collaborator. I have processed these data through the MiSeq pipeline with the associated animal fecal samples for OTU clustering and then split the shared file into lung samples and fecal samples to do individual analyses.

There is quite a substantial difference in read depth for the lung samples per sample, with the lowest having 26 reads and the highest having 99672 reads. Of course, the samples with really low coverage likely do not have sufficient sequencing depth to represent the microbial community present. I am trying to define a cutoff for what is “sufficient” coverage for these samples for subsetting purposes. I have generated the attached rarefaction curve where I can visually see that most curves reach the asymptotic phase by around 1k reads, but this is, of course, just my subjective view. I was curious if there is a better way to define this cutoff for the inclusion of samples and subsequent subsampling for OTU enrichment analysis.

Thanks!

Hi there,

I generally don’t look at these types of rarefaction curves. Rather, I look at a histogram of the number of reads on the x-axis and the number of samples on the y-axis. Are there any natural break points in the histogram? What number of sequence allows you to keep paired lung-fecal samples? These are the types of questions that I use when picking a threshold. From there, we rarefy/subsample everything to that number of reads and discard any samples below the threshold. We did this on a human lung-oral cavity study and wound up with 500 sequences per sample (http://doi.org/10.1164/rccm.201501-0128OC).

Hope this helps a bit…
Pat

Hi Pat,

Thank you very much for your response, it was very helpful!