mothur

non-16S contaminants in my amplicon

Hi all

The title says it all. I have been re-analyzing some 16S data in my new lab and during the first steps of making contigs and screening the sequences I noticed that there was a 1-2% of sequences in the files which were a bit more than 400bp. I extracted some of these weird sequences and blasted them which gave matches to various Ralstonia genes (not 16S just random regions of a Ralstonia genome).
Now this is something I hadnot seen before because contaminants are usually in the form of 16S, but this is sth entirely different. So I wonder how could this have happened? cross-contamination with other illumina samples in the same run?

One more thing, the assembled contigs are 300bp and since this is not due to residual primer adapters but they are clean 16S genes, I assumed that the folks who did the sequencing did this overlap thing to increase the contig size (which I personally dislike because the quality in the 2nd read can drop a lot).
I noticed later on that during the chimera search almost half of the sequences were eliminated! Could this be because of low quality sequence data due to the above reason?

cheers everyone
and happy Thursday

it’s possible to have carryover between sequencing runs. Though 1-2% seems really high. for you to even detect carryover, you’d have to have the same idexes which could be possible if the previous run had a nextera ralstonia genome and you use nextera protocol for generating the 16s.

I’d need more info for the 300bp issue-primers used, etc

sorry for taking that long to reply - I havent managed to find the protocol they used as this was some time ago and it was done by some old collaborators who have moved on.
What I did is I subsampled the contigs (after the make.contigs command) and mapped the reads from the fastq files which showed that there was a 200bp overlap and 50bp in each side of single reads.
I then trimmed the single read part of the contigs and used only the 200bp overlap and redid the analysis. After doing that (surprise, surprise) the number of chimaeras reduced dramatically and so did the complexity of the mditance matrix, the number of unique sequences etc etc
I will try to dig some more details on the protocol and primers they used!