formatting database into mothur format

laszlo · October 22, 2015, 10:23pm

Hi,

I am new user of mothur and the forum too.
First of all, I would like to say many thanks to the authors for the software and the detailed wiki. It is great.
My first question: If I have a database, how can I format it into mothur format so that I can use it then in the alignment? Is there any special format?

Thanks for your help.

pschloss · October 29, 2015, 11:09am

Nothing too special, it just needs to be an aligned fasta-formatted file

laszlo · November 7, 2015, 5:40am

Dear Patrick Schloss,

Thanks for your answer.
The fasta format needs to be aligned? I am working on fungal ITS sequences and not easy to align them. Does it work without the alignment?
Many thanks.
Laszlo

pschloss · November 9, 2015, 3:20pm

yeah, if you’re using ITS then you really can’t do alignment because there isn’t positional homology. I’d suggest using pairwise.seqs or pre.cluster. The next release of mothur will incorporate VSEARCH which will likely be a great help for people doing ITS. The caveat is that the OTU assignment may not be as good as what you’d get with pairwise.seqs/cluster

Pat

laszlo · November 15, 2015, 10:13pm

Dear Patrick Schloss,

We have a highly curated dataset and we would like to have assignment at species level (or even deeper level such as varieties). Do you think it is possible this kind of identification with mother pipeline? If you are interested we are happy to share our data and work on it. It would be great help for the community specially in the field we are working on.

hgmMBL · September 12, 2016, 7:47pm

So I have a fasta formatted reference file derived from silva123 alignment and tried to create the corresponding tax file.

when I run classify.seqs, I get nothing but these error messages:

‘U62813.UniAr107’ is in your template file and is not in your taxonomy file. Please correct.
‘U70679.Unc02vpl’ is in your template file and is not in your taxonomy file. Please correct.
‘AY344367.Unc02vrl’ is in your template file and is not in your taxonomy file. Please correct.
‘AY344412.Unc02vro’ is in your template file and is not in your taxonomy file. Please correct.
‘AY345533.Unc02vrz’ is in your template file and is not in your taxonomy file. Please correct.
‘AY345543.Unc02vs1’ is in your template file and is not in your taxonomy file. Please correct.

Can someone please post a sample of what the fasta and tax files should look like, or a fuller description of their format than this:
“The command requires that you provide a fasta-formatted input and database sequence file and a taxonomy file for the reference sequences” and I think I have done that.

My fasta and tax look like this:

AY230195.PeuSpec7
TG–GC-C-----------------------------------------------------------------------------------------------------------------------------------------…

AY230195.PeuSpec7 Bacteria;Proteobacteria;Gammaproteobacteria;Pseudomonadales;Pseudomonadaceae;Pseudomonas;
AY230764.BclSpe16 Bacteria;Firmicutes;Bacilli;Bacillales;Paenibacillaceae;Paenibacillus;

“Silva.bacteria.zip” files look like this:

AF515816.1
…

AB000389.1 Bacteria;Proteobacteria;Gammaproteobacteria;Alteromonadales;Pseudoalteromonadaceae;Pseudoalteromonas;
AB000699.1 Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae;Nitrosomonas;
AB000700.1 Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae;Nitrosomonas;

westcott · September 13, 2016, 2:20pm

That type of error usually occurs when there are spaces in the taxonomy. For example if you had something like:

seq1 D_0__Bacteria;D_1__Bacteroidetes;D_2__Sphingobacteriia;D_3__Sphingobacteriales;D_4__env.OPS 17;D_5__uncultured Bacteroidetes bacterium;D_6__uncultured Bacteroidetes bacterium;

D_6__uncultured Bacteroidetes bacterium contains spaces.

You can find these issues with the debug flag. Setting the debug flag will allow you to see what mothur is reading from the taxonomy file.

mothur > set.dir(debug=t)

NOTE: In version 1.39.0 mothur will be able to handle spaces in the taxonomy.

hgmMBL · September 14, 2016, 2:25pm

Sarah,

I set the debug flag and I get errors for probably every line in my file. I grepped for spaces and found none. Many of the errors are that there’s a final semicolon missing.

cat silva.v6.tax | grep “;” |wc
14914 33037 1667572

cat silva.v6.tax | grep -v “;” | wc
0 0 0

less mothur.1473862203.logfile | grep -v DEBUG | grep “is in your template file and is not in your taxonomy file. Please correct” | wc
14914 223710 1413841

UPDATE: I believe the problem was spaces at the END of the lines; I am about to try again.

Topic		Replies	Views
Using a custom taxonomic database	3	266	September 5, 2023
23s rRNA database alignment Commands in mothur	2	1912	June 1, 2015
Tweaking databases to include custom sequences Commands in mothur	14	12990	May 28, 2016
Align.seqs with the pr2 database Commands in mothur	2	511	July 31, 2022
Analisis of FUNGI with UNITE database in Mothur Commands in mothur	11	3528	November 1, 2017

formatting database into mothur format

Related topics