I used the 454 SOP up to the removal of chimeras, but interrupted it afterwards. I then deunique.seqs, split.groups and degap.seqs, since I wanted to compare taxonomic classification output of mothur with a different pipeline. When checking the outputted fasta files, I realized that some of the headers & sequences were duplicated. That was only the case for some sequences, e.g. 70 sequences in a 3000 sequences sample, but happened for all of my 12 samples. I have no idea at which step that happened. Do you have any ideas? Was that reported before?
Thanks for any help and comments!
Have you tried it with our current version?
No, I haven’t done it yet. I just wanted to know whether anybody else ever encountered something similar. I guess, I wouldn’t have noticed, if I would have followed the complete SOP. In the original fasta file from the sequencing company are no duplicate headers &sequences, of course.
Since its just a duplication, the overall picture of relative abundances etc doesn’t change, but in total there are roughly 1300 sequences that were not present in the beginning of processing the data.
If you post the exact command you ran I may be able to spot the issue?