I used the 454 SOP up to the removal of chimeras, but interrupted it afterwards. I then deunique.seqs, split.groups and degap.seqs, since I wanted to compare taxonomic classification output of mothur with a different pipeline. When checking the outputted fasta files, I realized that some of the headers & sequences were duplicated. That was only the case for some sequences, e.g. 70 sequences in a 3000 sequences sample, but happened for all of my 12 samples. I have no idea at which step that happened. Do you have any ideas? Was that reported before?
No, I haven’t done it yet. I just wanted to know whether anybody else ever encountered something similar. I guess, I wouldn’t have noticed, if I would have followed the complete SOP. In the original fasta file from the sequencing company are no duplicate headers &sequences, of course.
Since its just a duplication, the overall picture of relative abundances etc doesn’t change, but in total there are roughly 1300 sequences that were not present in the beginning of processing the data.