formatting database into mothur format

Hi,

I am new user of mothur and the forum too.
First of all, I would like to say many thanks to the authors for the software and the detailed wiki. It is great.
My first question: If I have a database, how can I format it into mothur format so that I can use it then in the alignment? Is there any special format?

Thanks for your help.

Nothing too special, it just needs to be an aligned fasta-formatted file

Dear Patrick Schloss,

Thanks for your answer.
The fasta format needs to be aligned? I am working on fungal ITS sequences and not easy to align them. Does it work without the alignment?
Many thanks.
Laszlo

yeah, if you’re using ITS then you really can’t do alignment because there isn’t positional homology. I’d suggest using pairwise.seqs or pre.cluster. The next release of mothur will incorporate VSEARCH which will likely be a great help for people doing ITS. The caveat is that the OTU assignment may not be as good as what you’d get with pairwise.seqs/cluster

Pat

Dear Patrick Schloss,

We have a highly curated dataset and we would like to have assignment at species level (or even deeper level such as varieties). Do you think it is possible this kind of identification with mother pipeline? If you are interested we are happy to share our data and work on it. It would be great help for the community specially in the field we are working on.

So I have a fasta formatted reference file derived from silva123 alignment and tried to create the corresponding tax file.

when I run classify.seqs, I get nothing but these error messages:

‘U62813.UniAr107’ is in your template file and is not in your taxonomy file. Please correct.
‘U70679.Unc02vpl’ is in your template file and is not in your taxonomy file. Please correct.
‘AY344367.Unc02vrl’ is in your template file and is not in your taxonomy file. Please correct.
‘AY344412.Unc02vro’ is in your template file and is not in your taxonomy file. Please correct.
‘AY345533.Unc02vrz’ is in your template file and is not in your taxonomy file. Please correct.
‘AY345543.Unc02vs1’ is in your template file and is not in your taxonomy file. Please correct.

Can someone please post a sample of what the fasta and tax files should look like, or a fuller description of their format than this:
“The command requires that you provide a fasta-formatted input and database sequence file and a taxonomy file for the reference sequences” and I think I have done that.

My fasta and tax look like this:

AY230195.PeuSpec7
TG–GC-C-----------------------------------------------------------------------------------------------------------------------------------------…

AY230195.PeuSpec7 Bacteria;Proteobacteria;Gammaproteobacteria;Pseudomonadales;Pseudomonadaceae;Pseudomonas;
AY230764.BclSpe16 Bacteria;Firmicutes;Bacilli;Bacillales;Paenibacillaceae;Paenibacillus;

“Silva.bacteria.zip” files look like this:

AF515816.1

AB000389.1 Bacteria;Proteobacteria;Gammaproteobacteria;Alteromonadales;Pseudoalteromonadaceae;Pseudoalteromonas;
AB000699.1 Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae;Nitrosomonas;
AB000700.1 Bacteria;Proteobacteria;Betaproteobacteria;Nitrosomonadales;Nitrosomonadaceae;Nitrosomonas;

That type of error usually occurs when there are spaces in the taxonomy. For example if you had something like:

seq1 D_0__Bacteria;D_1__Bacteroidetes;D_2__Sphingobacteriia;D_3__Sphingobacteriales;D_4__env.OPS 17;D_5__uncultured Bacteroidetes bacterium;D_6__uncultured Bacteroidetes bacterium;

D_6__uncultured Bacteroidetes bacterium contains spaces.

You can find these issues with the debug flag. Setting the debug flag will allow you to see what mothur is reading from the taxonomy file.

mothur > set.dir(debug=t)

NOTE: In version 1.39.0 mothur will be able to handle spaces in the taxonomy.

Sarah,

I set the debug flag and I get errors for probably every line in my file. I grepped for spaces and found none. Many of the errors are that there’s a final semicolon missing.

cat silva.v6.tax | grep “;” |wc
14914 33037 1667572

cat silva.v6.tax | grep -v “;” | wc
0 0 0

less mothur.1473862203.logfile | grep -v DEBUG | grep “is in your template file and is not in your taxonomy file. Please correct” | wc
14914 223710 1413841

UPDATE: I believe the problem was spaces at the END of the lines; I am about to try again.