Splitting up mixed Bacteria/Archaea 16S rRNA datasets

BSP · May 6, 2014, 1:47pm

Hey all,

I am currently working with a large illumina (MiSeq) dataset of 16S rRNA gene sequences, generated with primers targeting both archaea and bacteria (V4 region). In previous 454 studies I was able to identify/eliminate organisms from the other kingdom by alignment (SILVA archaea and bacteria database). Sequences generally aligned as ‘unclassified’ if they belonged to the other kingdom and if it was then possible to align and identify them within the other kingdom I eliminated them from the corresponding data set for further analysis. This is not possible with the illumina dataset (maybe due to shorter read length?) as almost all archaeal sequences seem to align within different bacterial groups (down to the phylum level) and bacterial sequences align within archaeal groups. Is there an elegant way to split up the dataset into bacteria/archaea within mothur?

Another problem which is unrelated to my main question but to the dataset itself: after the alignment to either the bacterial or archaeal SILVA reference alignment UCHIME identifies up to 60% of all sequences as chimeras, which makes no sense.

Can anybody help me with this issue or had to deal with similar datasets?

Best,

BSP

pschloss · May 7, 2014, 11:43am

I wouldn’t suggest screening by kingdom based on alignment as everything will align if you force it. It would be better to generate a concatenated alignment of archaea and bacteria and run your data through the pipeline as we describe in the SOP. When you get to the classify.seqs step, I would probably use the gg references we provide and then use the get.lineage command to generate archaea and bacteria-specific files

It’s possible to get 60% of your unique sequences to be chimeric, but when you look at the total percentage removed when you include the duplicates, you’ll generally have less than 20% chimeras. If it’s higher, I’d wonder about your PCR conditions.

Pat

BSP · May 7, 2014, 12:03pm

Thank you for this helpful answer, I will try so.

BSP · May 8, 2014, 9:01am

Adding to my question: what is the best way to build such a concatenated alignment of the SILVA Archaea and Bacteria alignments?
like this? align.seqs(candidate=silva.archaea.fasta, template=silva.bacteria.fasta)
blastn, gotoh, or needleman?

Any experience or concerns?

pschloss · May 9, 2014, 12:47pm

if you’re using mac/linux you could do…

cat silva.bacteria.fasta silva.archaea.fasta > silva.concatenate.fasta

Topic		Replies	Views
Need to separate Bacteria, Archaea and Eukarya sequences Commands in mothur	4	543	December 5, 2019
How can I extract Archaeal sequences also ? Commands in mothur	9	8346	November 11, 2014
Problem with 16S rRNA Archaea alignment mothur bugs	11	11437	August 8, 2014
chimera slayer template for archaea Commands in mothur	5	5731	November 5, 2010
classify.seqs across multiple domains Commands in mothur	1	2766	February 23, 2011

Splitting up mixed Bacteria/Archaea 16S rRNA datasets

Related topics