16S and 18S Sequence Mix for analysis!?

Hi @ all!

First of all, I’m a real noob to NGS amplicon analyses and not a bioinformatician :roll:

Well, let’s come to my problem. We have amplified the 18S V9 region of waste water samples using 18S primers ( Euk1391f, EukBr from Amaral-Zettler 2009) and a 2-step barcoding approach. Now we have sequenced them (as a rapid test) using HiSeq HighOutput 75bp SE run settings to get about 2 Mio reads per each sample. The sequencing worked quite well, quality ok, not much overrepresented seqs.

However, by doing a rather quick screening using VAMPS software I figured out that almost half of the reads represent bacterial contaminants and another 1/4 is unknown seqs…
As I understand correctly, to do an OTU cluster analysis in mothur I need aligned reads, which doesn’t make too much sense with a mix of 16S and 18S seqs, right!? However, if I’m doing an alignment in mothur on all reads using the silva seed database I get an alignment, but after applying command filter.seqs to remove “dots” from the alignment (trump=., vertical=F) the resulting alignment length is 0. Is this due to the non-well aligned reads resulting in the removal of all positions with a gap? So, I stuck here in the analysis and need some advice how to proceed further to finally get the OTU cluster pie chart…

Thanks for your help!

My advice would to separate the 18S from 16S sequences and proceed from there, either discarding the 16S data entirely or analysing it as a separate data set. I’ve had a surprising amount of success extracting 16S rRNA sequences from metagenomic data just through using the classify.seqs command. You could run your full data through classify.seqs using the SILVA database (which contains 16S and 18S sequences) and split your data based on the classification output. Alternatively, I read a paper recently (Jervis-Bardy et al (2015) Microbiome 3:19) that used KRAKEN to screen their data for non-specific amplicons prior to 16S analysis.

The Jervis-Bardy paper is quite interesting, but I will first follow your classify.seqs recommendation to extract 18S reads only and try the OTU clustering on those reads, as they are of utmost interest. Thanks dwaite! But I’m still open for further advice :wink:

I would analyze 16S and 18S rRNA gene sequences separately as they really are two sets of questions. What’s going on with bacteria (and perhaps archaea) and with eukaryotes.

I’m also not sure how well 2x75nt reads will fare in the analysis. I think it’s unlikely that these reads will overlap, which will only complicate things.