I have a dataset of lung (low biomass) samples of a group of patients with asthma. I do two data runs with the same FASTQ files.
Run 1 – includes all my controls, etc, about 140 samples all together. I run all the way through the Mi-Seq SOP with all the usual defaults and the Silva database. I look at taxonomy at the genus level and select a genus (e.g., Pseudomonas). Each sample in the shared and subsampled shared file has a count for Pseudomonas.
Run 2 – removes the controls based on advice I’ve been given, as well as some samples that do not make a specified quality cut, so I’m down to about 95 samples. As before I run the Mi-Seq SOP and use the same Silva database. Now the shared and subsampled shared genus taxonomy files, for the samples that were also present in run 1, have DIFFERENT counts for Pseudonomas. These differ by a lot.
These are the same FASTQ files, organized in two different runs, same SOP, differing only in that some FASTQ files were excluded from the beginning. One would think that sample #1, in run 1 and run 2, would have the same count at the genus level for the OTU that represents Pseudomonas. But they don’t. This is true for SOME of the other entries in the taxonomy table at phylum, order, family and genus level. But (for example) Lactobacillus compared very nicely across the two runs.
Thoughts? Does removing some samples cause the counts for each sample to change? Might this be the results of changes to filtering for chimeras, duplicates, etc during run 2 based on a different number of samples being present?