I am currently trying to align a large dataset of 16S archaea sequences. I went through a modified Schloss SOP. After the implemented command
“align.seqs(fasta=XX.chop.fasta, reference=silva.archaea.fasta)”
I noticed that 1/4 of my sequences didnt align at the front and back of the whole alignment implying in further steps that these sequences are trash. After carefully checking these sequences I noticed that they are most likely valid sequences (just don’t align well at the front and back)
Now I would run into the need of aligning these sequences “by hand”. Do you see another possibility? Is there maybe a “better”/modified alignment of the silva one?
I see this as not just a personal problem as with this problem alot of potential important phylogenetic information might get lost in the common standardized data analysis.
So it’s likely that many of the references sequences in silva.archaea.fasta are not full length. You might need to customize a database to include the regions that you are interested in.
Thanks for your reply! Unfortunately, when I used align.seqs(fasta=349F450F.shhh.trim.unique.fasta, reference=silva.archaea.fasta,flip=t processors=2) I obtained the same outputs
The sequences worked just fine when I aligned them against the silva.bacteria.fasta file. Then, when I classified these new aligned sequences using the green genes files gg_13_8_99.fasta and gg_13_8_99.gg.tax they were all classified as bacteria.
Now my question is how can that be if the primers used were specific for Archaea (Arch 349F/Arch 806R). In my search for an answer I even found a paper where pyrosequencing was conducted at the same facility I sent my samples using the same primers. In this case, the study was able to successfully identify Archaea members.
Could it be that I am missing an important step in the pipeline?