Differences between green genes 13_5_99 and 13_8_99

Hello,

I analyzing the microbial communities in marine sponges using the MiSeq SOP. For the reference and taxonomy alignments, I have tried using both versions of the green genes reference databases - the August 2013 version (gg_13_8_99) and the May 2013 version (gg_13_5_99). The reason that I tried both is that the May version works much better for aligning my clone library 16S sequences than the August version, and I am hoping to keep my clone library and Illumina data analyses consistent.

On the website, it says that both green genes versions have 202,421 bacterial and archaeal sequences, and I am wondering what is different between the two versions. After processing my Illumina data using both green genes reference databases, I am finding some differences in the ways that sequences are aligned and classified. For example, sequences classified in the phylum Deferribacteres were much more abundant (there were 2,067 sequences) when I used the gg_13_5_99 alignment and taxonomy. When I ran the same data with the gg_13_8_99 database, there were only 6 total sequences classified as Deferribacteres.

I would appreciate any insight into the differences between these two versions of the green genes database, published only a few months apart. Thanks!!

That’s probably a typo. The numbers are likely a bit different.

Also, I would never recommend using greengenes to align sequences. We only recommend aligning sequences using the silva reference. For classifying, greengenes would be appropriate.