I am new to microbiome analysis and mothur, and I am trying to analyse sequencing data of dnaK, a conserved core gene, rather than 16S.
We also sequenced the 16S and I follow the MiSeqSop tutorial, then I tried to apply the same commands to dnaK but with altering parameters because dnaK is less conserved than 16S.
dnaK should be able to assign species more accurately than 16S, theoretically with an average nucleotide identity > 95%, given that bacterial strains are usually considered of the same species if they have an average nucleotide identity > 95%.
So to my question, I would like to pre.cluster my reads based on this 95% identity and then classify.seqs them (I have built my own database of dnaKs). In this way I can have an idea of how many reads belong to my genus of interest and how many reads belongs to the known and unkown species, and maybe how many belongs to undescribed species.
with some reading in the forum, I came up with diff=7, because my amplicon is 295 and the 5% of 295 is 14.
is that correct?
how should I set diff= for allowing clustering as same species sequence with 95% identity?
or should I do this later following the tutorial with classify.otu with label=0.05.
Thank you for your help!