I have one question which is really confusing me, when I looked at the alignr.report file, I have all my reads assigned to some species from reference. So in total I have 7000 species for 7000 sequences, which gave me 1200 unique species.
Now, when I look at my otu file at distance 0.03, I get 2200 OTUs.
The question which is confusing me is why the two numbers are different? If my samples are hitting only 1200 unique species then how come I have 2200 OTUs. Is there something wrong in my analysis?
Please can anyone explain this.
Because not all of your sequences that map to the same reference are the same. This is an illustration of the limitations built into database-based approaches. The resolution is generally lousy because the db can’t hold all of the diversity that’s in your sample.
I understood that several of my sequences can get same hit in the database, but for making the OTUs we are using the list file which is eventually coming from the alignment file. So if alignment file is having only 1200 hits then how come OTUs are 2200. Is the formation of OTus has nothing to do with the alignment? which I guess is not correct. Can you please refer me where I can read about this stuff.
Thanks for your help and patience!!!
So imagine you have 100 sequences that all map to the same reference sequence when aligning. Those 100 sequences probably are not identical to the reference - they may be 2 or 3% from the reference and 2-3% from each other. So when you use the resulting alignment you’ll still have a lot of genetic diversity even though they all map to the same reference. This is why you’ll get multiple OTUs per genus / reference sequence.
Thank you so much, it is clear now.