Splitting up mixed Bacteria/Archaea 16S rRNA datasets

Hey all,

I am currently working with a large illumina (MiSeq) dataset of 16S rRNA gene sequences, generated with primers targeting both archaea and bacteria (V4 region). In previous 454 studies I was able to identify/eliminate organisms from the other kingdom by alignment (SILVA archaea and bacteria database). Sequences generally aligned as ‘unclassified’ if they belonged to the other kingdom and if it was then possible to align and identify them within the other kingdom I eliminated them from the corresponding data set for further analysis. This is not possible with the illumina dataset (maybe due to shorter read length?) as almost all archaeal sequences seem to align within different bacterial groups (down to the phylum level) and bacterial sequences align within archaeal groups. Is there an elegant way to split up the dataset into bacteria/archaea within mothur?

Another problem which is unrelated to my main question but to the dataset itself: after the alignment to either the bacterial or archaeal SILVA reference alignment UCHIME identifies up to 60% of all sequences as chimeras, which makes no sense.

Can anybody help me with this issue or had to deal with similar datasets?



I wouldn’t suggest screening by kingdom based on alignment as everything will align if you force it. It would be better to generate a concatenated alignment of archaea and bacteria and run your data through the pipeline as we describe in the SOP. When you get to the classify.seqs step, I would probably use the gg references we provide and then use the get.lineage command to generate archaea and bacteria-specific files

It’s possible to get 60% of your unique sequences to be chimeric, but when you look at the total percentage removed when you include the duplicates, you’ll generally have less than 20% chimeras. If it’s higher, I’d wonder about your PCR conditions.


Thank you for this helpful answer, I will try so.

Adding to my question: what is the best way to build such a concatenated alignment of the SILVA Archaea and Bacteria alignments?
like this? align.seqs(candidate=silva.archaea.fasta, template=silva.bacteria.fasta)
blastn, gotoh, or needleman?

Any experience or concerns?

if you’re using mac/linux you could do…

cat silva.bacteria.fasta silva.archaea.fasta > silva.concatenate.fasta