Hello,
Should probably start off by saying I’m new to this, and I’m trying to follow the MiSeq protocol to completion before diving into details.
Running into an error using classify.seqs() command. I’m unsure exactly why this is occurring. The .align and .tax file are from the mothur reference file recommendation. Error is below, copy/pasting the head and tail of the error.
mothur > classify.seqs(fasta=04test.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.fasta, count=04test.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.count_table, reference=silva.v4.fasta, taxonomy=silva.nr_v138_2.tax)
Using 32 processors.
Generating search database... DONE.
It took 28 seconds generate search database.
Reading in the silva.nr_v138_2.tax taxonomy... [ERROR]: KC189639.Unc81268 is missing the final ';', ignoring.
[ERROR]: JQ769778.Unc88719 is missing the final ';', ignoring.
[ERROR]: LN612929.UncR5461 is missing the final ';', ignoring.
[ERROR]: KC439348.E88Spec8 is missing the final ';', ignoring.
[ERROR]: JQ684271.Unc76929 is missing the final ';', ignoring.
[ERROR]: FPLK01001426.GJ6Z3519 is missing the final ';', ignoring.
[ERROR]: FJ230802.Unc66247 is missing the final ';', ignoring.
[ERROR]: KC683071.Unc09sfs is missing the final ';', ignoring.
[ERROR]: KC683078.Unc66555 is missing the final ';', ignoring.
[ERROR]: AY221059.Unc77530 is missing the final ';', ignoring.
**** Exceeded maximum allowed command errors, quitting ****
[ERROR]: KC683124.Unc09sjy is missing the final ';', ignoring.
DONE.
'KM213004.Unc55991' is in your template file and is not in your taxonomy file. Please correct.
'KC247157.HZLGynue' is in your template file and is not in your taxonomy file. Please correct.
'DQ181686.UncCy124' is in your template file and is not in your taxonomy file. Please correct.
…
Couple thousand lines or so later…
…
'GU437694.Unc46467' is in your template file and is not in your taxonomy file. Please correct.
'FJ800528.Unc47314' is in your template file and is not in your taxonomy file. Please correct.
'KF037397.Unc57379' is in your template file and is not in your taxonomy file. Please correct.
'FJ538172.UncCl239' is in your template file and is not in your taxonomy file. Please correct.
DONE.
It took 35 seconds get probabilities.
mothur >
Code I’ve run is below;
make.file(inputdir=., type=fastq, prefix=04test)
make.contigs(inputdir=., outputdir=., trimoverlap=T, file=04test.files, pdiffs=2, checkorient=t)
summary.seqs(fasta=04test.trim.contigs.fasta)
screen.seqs(fasta=04test.trim.contigs.fasta, count=04test.contigs.count_table, maxambig=0, maxlength=275, maxhomop=8)
summary.seqs(fasta=04test.trim.contigs.good.fasta, count=04test.contigs.good.count_table)
unique.seqs(fasta=04test.trim.contigs.good.fasta, count=04test.contigs.good.count_table)
summary.seqs(fasta=04test.trim.contigs.good.fasta, count=04test.contigs.good.count_table)
pcr.seqs(fasta=silva.nr_v138_2.align, start=11895, end=25318, keepdots=F)
rename.file(input=silva.nr_v138_2.pcr.align, new=silva.v4.fasta)
align.seqs(fasta=04test.trim.contigs.good.unique.fasta, reference=silva.v4.fasta)
summary.seqs(fasta=04test.trim.contigs.good.unique.align, count=04test.trim.contigs.good.count_table)
screen.seqs(fasta=04test.trim.contigs.good.unique.align, count=04test.trim.contigs.good.count_table, start=1977, end=11546)
summary.seqs(fasta=current, count=current)
filter.seqs(fasta=04test.trim.contigs.good.unique.good.align, vertical=T, trump=.)
unique.seqs(fasta=04test.trim.contigs.good.unique.good.filter.fasta, count=04test.trim.contigs.good.good.count_table)
summary.seqs(fasta=current, count=current)
pre.cluster(fasta=04test.trim.contigs.good.unique.good.filter.unique.fasta, count=04test.trim.contigs.good.unique.good.filter.count_table, diffs=2)
summary.seqs(fasta=04test.trim.contigs.good.unique.good.filter.unique.precluster.fasta)
chimera.vsearch(fasta=04test.trim.contigs.good.unique.good.filter.unique.precluster.fasta, count=04test.trim.contigs.good.unique.good.filter.unique.precluster.count_table, dereplicate=t)
classify.seqs(fasta=04test.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.fasta, count=04test.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.count_table, reference=silva.v4.fasta, taxonomy=silva.nr_v138_2.tax)
I have a suspicion the error is a result from the taxonomy and alignment file lineages mismatching. Ran some grep searches between the files and easily found some discrepancies. The alignment file (silva.v4.fasta) has an carrot prefix, spaces instead of underscores, and a more complete taxonomic lineage. See below;
grep "KC189639.Unc81268" silva.v4.fasta
>KC189639.Unc81268 93.31 Bacteria;Pseudomonadota;Alphaproteobacteria;Hyphomicrobiales;Hyphomicrobiales Incertae Sedis;Incertae Sedis;
grep "KC189639.Unc81268" silva.nr_v138_2.tax
KC189639.Unc81268 Bacteria;Pseudomonadota;Alphaproteobacteria;Hyphomicrobiales;Hyphomicrobiales_
So then I searched for “_” in the taxonomy file.
grep "_" ./silva.v4.fasta
[elundin@bio 04_Test_20241030]$ grep "_" ./silva.nr_v138_2.tax
AB824402.UncB2940 Bacteria;Bacillota;Clostridia;Lachnospirales;Lachnospiraceae;Lachnospiraceae_NK4A136_group;
AF005457.Di2Litto Eukaryota;Arthropoda;Insecta;Archaeognatha;Archaeognatha_fa;Archaeognatha_ge;
EF465492.FrrCapuc Eukaryota;Diatomea;Coscinodiscophytina_cl;Fragilariales;Fragilariales_fa;Fragilaria;
.....Couple thousand lines later.....
EF032753.UncA4888 Bacteria;Acidobacteriota;Acidobacteriae;Subgroup_2;
CP030993.AraHy387 Eukaryota;Phragmoplastophyta;Embryophyta;Fabales;Fabales_fa;Arachis;
EF032777.Unc59335 Bacteria;Verrucomicrobiota;Omnitrophia;Omnitrophales;Omnitrophaceae;Candidatus_Omnitrophus;
MF034602.HazBasi3 Eukaryota;Chlorophyta_ph;Ulvophyceae;Ulotrichales;Ulotrichales_fa;Hazenia;
LC081127.A8NRugos Eukaryota;Scalidophora;Kinorhyncha_cl;Homalorhagida;Homalorhagida_fa;Homalorhagida_ge;
HQ384693.D2ECaryo Eukaryota;Phragmoplastophyta;Embryophyta;Solanales;Solanales_fa;Montinia;
So, something’s up with the taxonomy file, the lineages have been messed up somehow. the ‘_fa’ suffixes aren’t correct.
I’m assuming the taxonomy between the two files should match, so I considered running a script to input the lineages from the alignment file into the taxonomy file using the unique identifiers, but given I’m new to this I don’t know what unknown errors I’ll be creating.
I feel like that may be the source of the issue for classify.seqs() not working. I reran the script with another persons code, whose mothur.log file indicated it had worked yet I ran into the same issue. I also re-downloaded the reference files without a change.
Any help is appreciated, thanks!
Update: I ran the same script with the Silva 138.1 alignment and taxonomy file. I did get some [WARNINGS] but no [ERRORS] after running classify.seqs(), so I think it works with the older reference files. Output below;
mothur > classify.seqs(fasta=04test.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.fasta, count=04test.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.count_table, reference=silva.v4.fasta, taxonomy=silva.nr_v138_1.tax)
Using 32 processors.
Reading template taxonomy... DONE.
Reading template probabilities... DONE.
It took 6 seconds get probabilities.
Classifying sequences from 04test.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.fasta ...
[WARNING]: M03075_700_000000000-LFHRK_1_1111_24969_25468 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M03075_700_000000000-LFHRK_1_1111_13672_3211 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M03075_700_000000000-LFHRK_1_2108_17191_22419 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M03075_700_000000000-LFHRK_1_2111_4016_15124 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M03075_700_000000000-LFHRK_1_2110_16130_25216 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M03075_700_000000000-LFHRK_1_2102_14024_6183 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M03075_700_000000000-LFHRK_1_1105_4578_15010 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M03075_700_000000000-LFHRK_1_2105_23713_24630 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M03075_700_000000000-LFHRK_1_1103_22850_15726 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M03075_700_000000000-LFHRK_1_1101_27262_14088 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
**** Exceeded maximum allowed command warnings, silencing warnings ****
[WARNING]: M03075_700_000000000-LFHRK_1_1110_26246_17600 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M03075_700_000000000-LFHRK_1_1114_27275_13949 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M03075_700_000000000-LFHRK_1_1112_10395_19553 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
....... 100 or so lines later.........
157
[WARNING]: M03075_700_000000000-LFHRK_1_2107_8852_16030 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
157
157
156
157
157
157
157
157
156
157
156
157
157
157
157
157
It took 6 secs to classify 5015 sequences.
It took 0 secs to create the summary file for 5015 sequences.
Output File Names:
04test.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.nr_v138_1.wang.taxonomy
04test.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.nr_v138_1.wang.tax.summary