I am new user of mothur and the forum too.
First of all, I would like to say many thanks to the authors for the software and the detailed wiki. It is great.
My first question: If I have a database, how can I format it into mothur format so that I can use it then in the alignment? Is there any special format?
Thanks for your answer.
The fasta format needs to be aligned? I am working on fungal ITS sequences and not easy to align them. Does it work without the alignment?
Many thanks.
Laszlo
yeah, if you’re using ITS then you really can’t do alignment because there isn’t positional homology. I’d suggest using pairwise.seqs or pre.cluster. The next release of mothur will incorporate VSEARCH which will likely be a great help for people doing ITS. The caveat is that the OTU assignment may not be as good as what you’d get with pairwise.seqs/cluster
We have a highly curated dataset and we would like to have assignment at species level (or even deeper level such as varieties). Do you think it is possible this kind of identification with mother pipeline? If you are interested we are happy to share our data and work on it. It would be great help for the community specially in the field we are working on.
So I have a fasta formatted reference file derived from silva123 alignment and tried to create the corresponding tax file.
when I run classify.seqs, I get nothing but these error messages:
‘U62813.UniAr107’ is in your template file and is not in your taxonomy file. Please correct.
‘U70679.Unc02vpl’ is in your template file and is not in your taxonomy file. Please correct.
‘AY344367.Unc02vrl’ is in your template file and is not in your taxonomy file. Please correct.
‘AY344412.Unc02vro’ is in your template file and is not in your taxonomy file. Please correct.
‘AY345533.Unc02vrz’ is in your template file and is not in your taxonomy file. Please correct.
‘AY345543.Unc02vs1’ is in your template file and is not in your taxonomy file. Please correct.
Can someone please post a sample of what the fasta and tax files should look like, or a fuller description of their format than this:
“The command requires that you provide a fasta-formatted input and database sequence file and a taxonomy file for the reference sequences” and I think I have done that.
I set the debug flag and I get errors for probably every line in my file. I grepped for spaces and found none. Many of the errors are that there’s a final semicolon missing.
less mothur.1473862203.logfile | grep -v DEBUG | grep “is in your template file and is not in your taxonomy file. Please correct” | wc
14914 223710 1413841
UPDATE: I believe the problem was spaces at the END of the lines; I am about to try again.