The title says it all. I have been re-analyzing some 16S data in my new lab and during the first steps of making contigs and screening the sequences I noticed that there was a 1-2% of sequences in the files which were a bit more than 400bp. I extracted some of these weird sequences and blasted them which gave matches to various Ralstonia genes (not 16S just random regions of a Ralstonia genome).
Now this is something I hadnot seen before because contaminants are usually in the form of 16S, but this is sth entirely different. So I wonder how could this have happened? cross-contamination with other illumina samples in the same run?
One more thing, the assembled contigs are 300bp and since this is not due to residual primer adapters but they are clean 16S genes, I assumed that the folks who did the sequencing did this overlap thing to increase the contig size (which I personally dislike because the quality in the 2nd read can drop a lot).
I noticed later on that during the chimera search almost half of the sequences were eliminated! Could this be because of low quality sequence data due to the above reason?
and happy Thursday