reference for alignment vs taxonomy


I am trying to decide if I should append our environmental sequences to the standard mothur SILVA files for alignment or taxonomy. If I understand the posts related to this correctly, there is probably not much advantage to adding to the alignment files (I am using a concatenated version of the arc and bac files that have been pcr.seq’d for universal TAG priming regions of interest) since the silva reference files should have plenty of sequences to get the alignment correct.

However, some of my taxa are not being correctly assigned with the mothur silva tax files, so I would like to beef them up with environmental sequences I expect to recover. But I am not sure how robust the reference file needs to be for classification. Should I only use full length sequences (most of what I have are only partial sequences from clone libraries ~700bp)? No Ns? Etc.


So the silva alignment is great for aligning sequences, but not so great (IMO) for classification. I would strongly steer you towards the RDP or the greengenes training sets that we prove on the wiki.



I am interested to use mothur for my 454 dataset

i have demultplexed the data with QIIME and try to align with silva reference in mothur

my seqs were not aligned well and screen.seqs produced empty file

How can i improved the align

Thank you


Are you sure that your sequences are going in the right direction? Sometimes people sequence from the 3’ end to the 5’ end of the gene and then don’t flip it. Or they flip it when they sequenced from 5’ to 3’. If you can post the output of summary.seqs on the fasta file you are getting out of align.seqs we can take a look and help.