I am currently working with a large illumina (MiSeq) dataset of 16S rRNA gene sequences, generated with primers targeting both archaea and bacteria (V4 region). In previous 454 studies I was able to identify/eliminate organisms from the other kingdom by alignment (SILVA archaea and bacteria database). Sequences generally aligned as ‘unclassified’ if they belonged to the other kingdom and if it was then possible to align and identify them within the other kingdom I eliminated them from the corresponding data set for further analysis. This is not possible with the illumina dataset (maybe due to shorter read length?) as almost all archaeal sequences seem to align within different bacterial groups (down to the phylum level) and bacterial sequences align within archaeal groups. Is there an elegant way to split up the dataset into bacteria/archaea within mothur?
Another problem which is unrelated to my main question but to the dataset itself: after the alignment to either the bacterial or archaeal SILVA reference alignment UCHIME identifies up to 60% of all sequences as chimeras, which makes no sense.
Can anybody help me with this issue or had to deal with similar datasets?