Hi,
I found a problem using the “trainset14_032015.pds.tax” as a reference file.
For a sequence EU679419_S001240760 in the reference file,
“trainset14_032015.pds.tax” gives “Bacteria;Chloroflexi;Dehalococcoidetes;Dehalogenimonas;”
but the older version, “trainset9_032012.pds.tax” gives “Bacteria;“Chloroflexi”;“Dehalococcoidetes”;“Dehalococcoidetes”_order_incertae_sedis;“Dehalococcoidetes”_family_incertae_sedis;Dehalogenimonas;”
For a sequence in my data,
The former result in:
0.2.16 Chloroflexi
0.2.16.5 Dehalococcoidetes
0.2.16.5.1 Dehalogenimonas
0.2.16.5.1.1 unclassified
0.2.16.5.1.1.1 unclassified
The latter result in:
0.2.9 Chloroflexi
0.2.9.2 Dehalococcoidetes
0.2.9.2.1 Dehalococcoidetes_order_incertae_sedis
0.2.9.2.1.1 Dehalococcoidetes_family_incertae_sedis
0.2.9.2.1.1.1 Dehalogenimonas
That is, if I use “trainset14_032015.pds.tax”, the sequence is not classified at the genus level.
It seems that many sequences in the “trainset14_032015.pds.tax” have the same problem.
Thank you for your concern.