Classify.seqs with silva.nr_v123 files provided by mothur

ancs · March 11, 2016, 2:39pm

Hi!
I am currently analysing an 18S amplicon dataset and am new to the world of mothur and microbiome analyses in general.
While working my way through the MiSeq SOP, I realised that the silva.nr_v123.align file does not fully cover my amplicons. My sequences are ca. 150bp long and extend about 20 bp over the end of the alignment. So using the align.seqs command with my dataset and the silva.nr_v123. align file removed 20bp at the end of my sequences. I solved this problem by making a custom alignment file with original files downloaded from SILVA.

However, I was not able to generate files that cover my entire amplicon and that can be used in the classify.seqs command. I followed the README on the mothur blog (http://blog.mothur.org/2015/12/03/SILVA-v123-reference-files/) to make these files, but did not succeed; permission issues on the computer I am working on prevented me unfortunately from saving the fasta_mothur.eft file in the right folder.

So, would it be possible that you provide the mothur compatible silva.full_v123.fasta file? Then I can make my own taxonomy and template files to use in the classify.seqs command.

Alternatively, do you have any idea if using the classify.seqs command with the silva.nr_v123.ng.fasta (i.e. 20 bp too short) and silva.nr_v123.tax files on my dataset would significantly impact the classifications?

Kind regards. Anke.

pschloss · March 14, 2016, 12:59pm

I’m pretty sure that what is happening is that we have removed the distal primer region of the gene. This is a feature rather than a bug since the sequence of the region the primer anneals to is generally not trusted because of mispriming issues. So I would discourage using that region for alignment or classification.

Make sense?
Pat

ancs · March 16, 2016, 9:55am

Ummm, yes and no :?

I understand that you removed the primer regions and that those should not be used for alignment and classification. But you used a different primer set (16S, 27f and 1492r) than I did (18S, 1391f and 1510r). So, when I use trim.seqs with my oligos file, primers are removed from my sequences, too. But because I used a reverse primer downstream of your reverse primer, my sequences extend about 18bp over the 3’end of the mothur fasta files provided.

So I wonder how using these slightly too short files will impact the classification of my sequences. Or even better, how I can obtain a taxonomy file and a corresponding fasta file that covers the entire region of my amplicons (keeping in my that I have trouble with ARB that I was unfortunately still not able to solve).

Does that make sense?

Kind regards, Anke.

pschloss · March 18, 2016, 11:46am

If I recall the full length 18S data is pretty sparse. I’m afraid you’ll probably have to regenerate your own reference file if you need something at the 3’ end of the gene.

Pat

Topic		Replies	Views
18S Customize your reference alignment Theory behind mothur	3	1659	September 21, 2019
classify.seqs: degap the V4 alignment file? Commands in mothur	1	806	August 21, 2017
Blank file after summary.seqs mothur bugs	24	12020	July 2, 2015
Align.seq with silva database and classify.seq using PR2 database	1	309	November 28, 2022
reference for alignment vs taxonomy Theory behind mothur	3	4331	March 12, 2014

Classify.seqs with silva.nr_v123 files provided by mothur

Related topics