thanks dwaite.
I tried to do some testing of just one phylum (maxillapoda) downloaded from BOLD and formatted similar to how you suggested. I had to add an “;” to all lines of the tax file that did not have one (or mothur flagged as error). And I had to make the ids from the taxonomy vs fasta database line up. But, after using some perl to only keep lines with common ids, I still have this issue where during classify.seqs, mothur can’t seem to find the ids of the taxonomy in the fasta
classify.seqs (fasta= , template=, reference=)
…
'GBCX4624-15' is in your template file and is not in your taxonomy file. Please correct.
'SLAVA126-11' is in your template file and is not in your taxonomy file. Please correct.
'BIPOL456-10' is in your template file and is not in your taxonomy file. Please correct.
'GBA11172-13' is in your template file and is not in your taxonomy file. Please correct.
DONE.
It took 3 seconds get probabilities.
note: file names in classify.seqs left out for simplicity- they do exist of course and there are snippets of them below
Anyway, when I run this, it completes, but I only get the tax.sum file and no .taxonomy file. Why would I get the tax.sum ?
FYI, it looks like this:
~/Downloads# head Final.fix.tree.sum
#1.39.5
2750
7
0 3
Root
2 Arthropoda
1208 unclassified
1 unknown
1 1
Anyway, when I grep those seq names, they are indeed in the taxonomy file, so that is Puzzling.
~/Downloads# grep -n GBCX4624-15 Final.fix.tax
18493:GBCX4624-15 Arthropoda;Maxillopoda;Sessilia;Archaeobalanidae;Acastinae;Conopea;Conopea sp.
Here is what my final tax/template files look like:
~/Downloads# head Final.fix.tax
GBA1955-07 Arthropoda;Maxillopoda;Sessilia;Balanidae; ;Balanus;Balanus glandula;
GBA1983-07 Arthropoda;Maxillopoda;Sessilia;Balanidae; ;Balanus;Balanus glandula;
GBA2013-07 Arthropoda;Maxillopoda;Sessilia;Balanidae; ;Balanus;Balanus glandula;
GBA4315-09 Arthropoda;Maxillopoda;Sessilia;Balanidae; ;Balanus;Balanus glandula;
GBA4369-09 Arthropoda;Maxillopoda;Kentrogonida;Sacculinidae; ;Heterosaccus;Heterosaccus californicus;
GBCX0185-06 Arthropoda;Maxillopoda;Calanoida;Metridinidae; ;Metridia;Metridia gerlachei;
GBCX0186-06 Arthropoda;Maxillopoda;Calanoida;Pseudodiaptomidae; ;Pseudodiaptomus;Pseudodiaptomus nihonkaiensis;
GBCX0194-06 Arthropoda;Maxillopoda;Calanoida;Temoridae; ;Eurytemora;Eurytemora pacifica;
GBCX0195-06 Arthropoda;Maxillopoda;Calanoida;Pontellidae; ;Labidocera;Labidocera rotunda;
GBCX0196-06 Arthropoda;Maxillopoda;Calanoida;Tortanidae; ;Tortanus;Tortanus dextrilobatus;
~/Downloads# head Final.fasta
>GBA1955-07
------------------------------------------------------------
------------------------------------CTTATTCGGGCTGAACTTGGTCAA
CCAGGTAGACTGATTGGAGAT---GATCAGATTTACAATGTAATTGTTACTGCTCATGCT
TTTATTATGATTTTTTTCATAGTTATACCTATTATAATTGGGGGTTTTGGTAATTGATTA
CTTCCATTAATATTAGGAGCTCCTGATATAGCTTTTCCACGTCTTAATAATATAAGTTTT
TGGCTATTACCCCCAGCTTTAATATTGTTGATTAGAGGATCATTAGTAGAAGCTGGAGCT
GGTACTGGATGGACAGTTTACCCTCCTTTATCGAGAAATATTGCCCATTCAGGAGCATCG
GTAGATTTATCTATTTTTTCTCTCCATTTAGCTGGAGCTTCATCTATTCTTGGGGCCATT
AATTTTATATCGACAGTTATTAAT------------------------------------
Now, I know there are some tax lines that end strangely, with numbers, and lack the “;” to end the line. But, it appears they are ‘ignored’ by mothur. Anyway, Im not sure where to go from here. Are these tax/template files close? OR why might
classify.seqs
be failing?