Many OTUs classify into the same species??

Hi, i have a question and hope to get some answers. I have a MiSeq dataset which i analyzed using mothur. I classified my OTUs using the Greengene database, which classified a number of my OTUs into the species level (unlike Silva which can do only to genus level). Assuming that my OTUs are species, it is expected that all sequences for a given species would be pinned into the same OTU. However, i find that tens of OTU’s classified into the same species. I used a mock community and my sequencing error showed as “0”. I used 95% similarity for my OTUs. I got almost 14k OTU for a 95+1 sample. (more details: i have OTU 2, 3, and 5 belong for the same species)

many thanks

Ousama

yes and many “species” collapse into a single OTU (shigella and e.coli for example). Whether or not a “species” gets a name has little to do with it’s 16S sequence and everything to do with the amount of work someone wanted to put into characterizing and naming it. I don’t think you can accurately assign species using just v4, so wouldn’t even look at those identities even though greengenes spits them out.

eta, just noticed you used 95% OTUs? that is genus level OTUs. Why would you want to identify genus level OTUs to species?

Thank you. I agree. It is very challenging.
just for clarification. I used R1 only for V1-3 (V1-2 after trimming). More,
I used 95% to identify species not genus. There is along argument that 97% is not accurate. The 97% is based on old DNA DNA hybridization studies. we can get into details but for now, just pick a reference database of your choice and pin it to OTUs. Based on full length of the sequence you are likely to get the expected number of species or OTU based on 97%, for example 5k. However, if you trim the sequences into V1-3 or more accurately v1-2 (or V4, …) as i did in a previous study, you find that the number of OTUs will jump into at least 8k!!! and about 5k at 95%, meaning 95% is likely a better indicator for species. This argument is very valid and i have used 95% in previous papers and it is getting suggested also by others.


Thanks Ousama