Hi!
I am currently analysing an 18S amplicon dataset and am new to the world of mothur and microbiome analyses in general.
While working my way through the MiSeq SOP, I realised that the silva.nr_v123.align file does not fully cover my amplicons. My sequences are ca. 150bp long and extend about 20 bp over the end of the alignment. So using the align.seqs command with my dataset and the silva.nr_v123. align file removed 20bp at the end of my sequences. I solved this problem by making a custom alignment file with original files downloaded from SILVA.
However, I was not able to generate files that cover my entire amplicon and that can be used in the classify.seqs command. I followed the README on the mothur blog (http://blog.mothur.org/2015/12/03/SILVA-v123-reference-files/) to make these files, but did not succeed; permission issues on the computer I am working on prevented me unfortunately from saving the fasta_mothur.eft file in the right folder.
So, would it be possible that you provide the mothur compatible silva.full_v123.fasta file? Then I can make my own taxonomy and template files to use in the classify.seqs command.
Alternatively, do you have any idea if using the classify.seqs command with the silva.nr_v123.ng.fasta (i.e. 20 bp too short) and silva.nr_v123.tax files on my dataset would significantly impact the classifications?
I’m pretty sure that what is happening is that we have removed the distal primer region of the gene. This is a feature rather than a bug since the sequence of the region the primer anneals to is generally not trusted because of mispriming issues. So I would discourage using that region for alignment or classification.
I understand that you removed the primer regions and that those should not be used for alignment and classification. But you used a different primer set (16S, 27f and 1492r) than I did (18S, 1391f and 1510r). So, when I use trim.seqs with my oligos file, primers are removed from my sequences, too. But because I used a reverse primer downstream of your reverse primer, my sequences extend about 18bp over the 3’end of the mothur fasta files provided.
So I wonder how using these slightly too short files will impact the classification of my sequences. Or even better, how I can obtain a taxonomy file and a corresponding fasta file that covers the entire region of my amplicons (keeping in my that I have trouble with ARB that I was unfortunately still not able to solve).
If I recall the full length 18S data is pretty sparse. I’m afraid you’ll probably have to regenerate your own reference file if you need something at the 3’ end of the gene.