Sorry in advance for starting a new topic but I couldnt find anything relevant using the search function. I would like to share a personal experience and also ask the experts if my intuition is right about a particular problem I am facing (which hopefully will also help others in the future).
I have been analyzing 16S MiSeq data of ca. 50 cow fecal samples. I am new to the lab and I only arrived few months ago, hence the samples were sequenced and data were analyzed long before I arrive. The sequencing was done locally in a facility that favors a partial overlap strategy: i.e. the amplicon sequenced is appx 300bp long and therefore Forward and Reverse reads do not fully overlap but instead they overlap over 200bp in the middle and then there are 50bp of single reads in each side. The people in my current lab did the analysis using dada2 and gave me the analyzed data but also the original fastq files. Stubborn as I am I redid the analysis from scratch using mothur and produced a 97%-OTU table (I only kept the 200bp that fully overlap and trimmed the 50bp of single reads in each side), and now here starts the funny part:
Using mothur
All samples have a sequencing depth from 8200-25000 with a median of 11581 and all samples seem perfectly usable in terms of sequencing depth.
Total number of OTUs is apx. 18000, if I remove singletons (which I dont) 7975.
If I rarefy at 8000 and do a bray curtis ordination i see a very nice clustering according to treatment (as I would expect cause the animals the samples came from were infected/non-infected with a gastrointestinal pathogen)
Using dada2
Samples have a sequencing depth from 60-20000 with a median of 3700 and ten of the samples have less than 500 seq depth (three out of ten <90 and most are <200).
Total number of ASVs are 249! if I remove singletons 244.
Rarefication at 1200 (or unrarefied data) produces an ordination plot that is entirely different than the one produced by mothur and there is absolutely no apparent clustering according to any variable (treatment, animal etc)
Here is what I think:
I mentioned in the beginning that I have cow fecal samples not with the purpose of making people nauseous but to point out that I would expect a massively complex microbiome. Also considering that I have 50 samples, 2 different treatments and 10 animals, I would expect a decent number of OTUs/ASVs. This seem to be the case with mothur but not with dada2 which claims that the total number of species are 249!
I dont think that there is sth wrong with dada2, however, I think that the combination of the sequencing strategy (50bp overhang at each side) and the aggressive filtering algorithm of dada2 have removed A LOT of sequences because this sequencing strategy creates sequences with lots of sequencing error (i think the quality drops a lot after a point in single reads) and the dada2 will remove them.
I posted the whole story here cause I d like to hear if there is sth obvious I might be missing and I actually would be happy to share confidentially some of these data (confidentially cause they are unpublished) if someone wants to try and reproduce this. I know that I am a mothur fanboy and clearly I am biased but I would like to use these data (if possible) and therefore i would like to figure out the best strategy so I welcome any comments, criticism and suggestions.
Happy Wednesday