classify.seqs

Hi,
I try to do classify.seqs with latest RDB reference file. but end up like following,

mothur >
classify.seqs(fasta=current, name=current, group=current, template=trainset14_032015.pds.fasta, taxonomy=trainset14_032015.pds.tax, cutoff=60, processors=2)
Using soil_leaf_root_16s.trim.unique.good.filter.unique.precluster.pick.fasta as input file for the fasta parameter.
Using soil_leaf_root_16s.good.pick.groups as input file for the group parameter.
Using soil_leaf_root_16s.trim.unique.good.filter.unique.precluster.pick.names as input file for the name parameter.

Using 2 processors.
Generating search database… DONE.
It took 7 seconds generate search database.

Reading in the trainset14_032015.pds.tax taxonomy… [ERROR]: FR749980_S002350843 is missing the final ‘;’, ignoring.
[ERROR]: ; is missing the final ‘;’, ignoring.
[ERROR]: GL982576_U010303693 is already in your taxonomy file, names must be unique.
DONE.
‘DQ343153_S000640727’ is in your template file and is not in your taxonomy file. Please correct.
‘EU928765_S001872839’ is in your template file and is not in your taxonomy file. Please correct.
‘AY639887_S000333610’ is in your template file and is not in your taxonomy file. Please correct.
‘EU167539_S001044475’ is in your template file and is not in your taxonomy file. Please correct.
‘CP002363_S004071194’ is in your template file and is not in your taxonomy file. Please correct.
‘AY264344_S000540507’ is in your template file and is not in your taxonomy file. Please correct.
‘AF250331_S000390067’ is in your template file and is not in your taxonomy file. Please correct.
‘EF495227_S000859418’ is in your template file and is not in your taxonomy file. Please correct.
‘CP000155_S000624651’ is in your template file and is not in your taxonomy file. Please correct.
‘AY676463_S000422111’ is in your template file and is not in your taxonomy file. Please correct.
‘AF152107_S000388242’ is in your template file and is not in your taxonomy file. Please correct.
‘AF152106_S000388241’ is in your template file and is not in your taxonomy file. Please correct.
‘AB200231_S000470086’ is in your template file and is not in your taxonomy file. Please correct.
‘AB200233_S000470088’ is in your template file and is not in your taxonomy file. Please correct.
‘AB200232_S000470087’ is in your template file and is not in your taxonomy file. Please correct.
‘DQ360414_S000651067’ is in your template file and is not in your taxonomy file. Please correct.
‘AF078765_S000428521’ is in your template file and is not in your taxonomy file. Please correct.
.
.
.
.

Hi all,

I had the same problem. I was able to successfully run classify.seqs using version 10.

I did:

classify.seqs(fasta=stability.files.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=stability.files.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.count_table, reference=trainset14_032015.pds.fasta, taxonomy=trainset14_032015.pds.tax, cutoff=80)

And got:…

‘AC150267.38366.40176’ is in your template file and is not in your taxonomy file. Please correct.
‘AC170254.35410.37285’ is in your template file and is not in your taxonomy file. Please correct.
‘HM156711.1.1831’ is in your template file and is not in your taxonomy file. Please correct.
DONE.
It took 16 seconds get probabilities.

Hi,

I have the same problem as described above. Classify.seqs works well for version 10, but not for version 14.

Katharina

Sorry about that, it seems that the original files from the RDP had the duplication. If you try again, everything should work now. Thanks for the heads up!

Pat

Hi Pat,

Thank you for the quick response. When I try running classify.seqs with the new file, it classifies all of my sequences as chloroplast/eukarotic/etc and removes them (below). That didn’t happen with the older version- would it have changed that much? Or could I be missing something somewhere upstream?

Corianne

mothur > classify.seqs(fasta=stability.files.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=stability.files.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.count_table, reference=trainset14_032015.pds.fasta, taxonomy=trainset14_032015.pds.tax, cutoff=80)

[WARNING]: M01895_15_000000000-AAHFF_1_1119_22623_17595 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M01895_15_000000000-AAHFF_1_1119_21639_8437 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.

[WARNING]: mothur reversed some your sequences for a better classification. If you would like to take a closer look, please check stability.files.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.flip.accnos for the list of the sequences.

It took 2485 secs to classify 68303 sequences.

It took 6 secs to create the summary file for 68303 sequences.

Output File Names:
stability.files.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.taxonomy
stability.files.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.tax.summary
stability.files.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.flip.accnos

mothur > remove.lineage(fasta=stability.files.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=stability.files.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.count_table, taxonomy=stability.files.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.taxonomy, taxon=Chloroplast-Mitochondria-unknown-Archaea-Eukaryota)
Using stability.files.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.count_table as input file for the count parameter.
Using stability.files.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta as input file for the fasta parameter.

[NOTE]: The count file should contain only unique names, so mothur assumes your fasta, list and taxonomy files also contain only uniques.

Your taxonomy file contains only sequences from Chloroplast-Mitochondria-unknown-Archaea-Eukaryota.
Your fasta file contains only sequences from Chloroplast-Mitochondria-unknown-Archaea-Eukaryota.

I suspect it’s something upstream, but if you post one of the problematic sequences, we can take a closer look.

Pat

Thank you, but it was my fault! All fixed now.