Reference database and custom database

McCoyk · November 18, 2016, 8:41pm

I’m a novice limping my way through the Miseq SOP. I’m so grateful for the detail of the resource. Nevertheless, I’ve gotten hung up.

Miseq SOP pcr.seq:

Says we need reference database silva.bacteria.fasta. Where is this? If you google it, you get:

This has links for 3 releases from SILVA (there is a newer release now, 128, that isn’t on this page). I tried to go through the steps listed on:

http://blog.mothur.org/2015/12/03/SILVA-v123-reference-files/

to create a database for release 128 but got hung up on the Screening the sequences step. Looks like the following commands need to be executed in Unix though this wasn’t specified:
mothur “#screen.seqs(fasta=silva.full_v123.fasta, start=1044, end=43116, maxambig=5, processors=8);
pcr.seqs(start=1044, end=43116, keepdots=T);
degap.seqs();
unique.seqs();”
grep “>” silva.full_v123.good.pcr.ng.unique.fasta | cut -f 1 | cut -c 2- > silva.full_v123.good.pcr.ng.unique.accnos
mothur “#get.seqs(fasta=silva.full_v123.good.pcr.fasta, accnos=silva.full_v123.good.pcr.ng.unique.accnos)”
#generate alignment file
mv silva.full_v123.good.pcr.pick.fasta silva.nr_v123.align
#generate taxonomy file
grep “^>” silva.full_v123.fasta | cut -f 1,3 | cut -c 2- > silva.full_v123.tax.temp

Can this be done on a command line in Windows? That's what I've been using (except when I needed arb, I used our MAC. We don't have Unix.) I don't understand why the screen.seqs, pcr.seqs, degap, and unique.seqs commands are all grouped together like that. In the rest of the instructions in the Miseq SOP, each command is one line and executed separately. I don't know how to enter all of these into the command line to get them all to execute. When I hit return after "#screen.seqs(fasta=silva.full_v128.fasta, start=1044, end=42900, maxambig=5, processors=2), it started executing. It produced 2 output files: silva.full_v128.good.fasta and silva.full_v128.bad.accnos

Is pcr.seqs(start=1044, end=42900, keepdots=T) supposed to execute on silva.full_v128.good.fasta? I tried this and got errors:

mothur fasta=/Users/me/silva.full_v128.good.fasta, start=1044, end=42900, keepdots=T)
Invalid command.
Valid commands are: align.check, align.seqs, amova, anosin, bin.seqs……unique.seqs, venn.

Then I tried putting all the commands on one line:

mothur “#screen.seqs(fasta=silva.full_v128.fasta, start=1044, end=42900, maxambig=5); pcr.seqs(start=1044, end=42900, keepdots=T); degap.seqs(); unique.seqs();”
keepdots is not a valid parameter.
The valid parameters are : fasta, contigsreport, alignreport……and minsim.
[ERROR]: cannot convert 5) to an integer.
Using 1 processors.
[ERROR]: did not complete screen.seqs

I tried removing the hash and quotes. Didn't work. If I ever get this part going, then can I use findstr instead of grep? I'm on a windows machine.
Since I couldn't get that to work, I decided to go ahead with what's already been done with release 123. So back to needing silva.bacteria.fasta. Where is this for release 123? On: https://www.mothur.org/wiki/Silva_reference_files There is a "bacterial references" link at the bottom that has 14956 sequences in it, not the 152,308 Bacterial sequences supposedly in release 123. When I download "full length sequences and taxonomy references" from release 123 and unzip it I get Silva.nr_v123 folder with an 8.4 gB silva.nr_v123.align file and a 16mB silva.nr_v123.tax file. No sign of silva.bacteria.fasta. So there's only one of these and it's from release 102? Or am I supposed to do stuff to that giant align file from silva release 123 to get it to the silva.bacteria.fasta stage?
Since I'm stymied with moving forward with the reference databases, I decided to see how far I could get with my own database. What I REALLY want to do is take my sample data and use it to query a tiny database (165 sequences) to see if they occur in my samples. These 165 sequences are from Genbank. I need a taxonomy file for my database so I need to run classify.seqs.

The wiki states I need fasta-formatted input and database sequence file and a taxonomy file. So I can’t do this step either because I don’t have a database sequence file and taxonomy file.

To condense:

Where is silva.bacteria.fasta for release 123?

If nowhere, how to generate from .align file?

How to execute commands for screen.seqs and the rest from
http://blog.mothur.org/2015/12/03/SILVA-v123-reference-files/

on a windows machine? These commands are for Unix, I think.

pschloss · November 22, 2016, 7:22pm

You want the Recreated Silva Seed: https://www.mothur.org/w/images/1/15/Silva.seed_v123.tgz

Pat

Topic		Replies	Views
pcr.seqs command Commands in mothur	1	3018	November 11, 2014
classify.seqs Commands in mothur	1	1735	August 4, 2014
How to use Mothur software	12	590	May 2, 2022
Issues with customizing the reference database in Miseq SOP Commands in mothur	4	586	July 18, 2021
Silva.bacteria.fasta file Commands in mothur	4	33	July 5, 2025

Reference database and custom database

Related topics