classify.seqs gives "is bad" warning?

in the log file for classify.seqs, some sequences have warning like this:

[WARNING]: gnl|SRA|SRR050598.937.4 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.

while some have a two-line warning like this:

gnl|SRA|SRR050658.2906.4is bad.
[WARNING]: gnl|SRA|SRR050658.2906.4 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.

what does “is bad” mean?

I only executed 3 commands prior to classify.seqs: (formatted to make it more readable here)

mothur > screen.seqs(fasta=part1.fasta, group=group1.groups, maxambig=0, maxhomop=8, minlength=200, maxlength=500)

mothur > unique.seqs(fasta=part1.good.fasta)

mothur > align.seqs(fasta=part1.good.unique.fasta, reference=silva.bacteria.fasta, flip=t)

mothur > classify.seqs(fasta=part1.good.unique.align, name=part1.good.names, group=group1.good.groups, taxonomy=trainset9_032012.pds.tax,
template=trainset9_032012.pds.fasta, cutoff=60)

I’m using mothur version 1.30
Thanks.

Can you run get.seqs to pull that sequence out and post it for us?

Hi Pat,

Here are 3 sequences from the same file, that had the “is bad” warning.

>gnl|SRA|SRR050658.2906.4
TCCCGGGACCCGGACCCGTACCCTACGGTAACGAAAACGAACGAACGTTCCCTTTAACGTAATTCGGTTTTCCCCCGTTTTCCCCCGTACCGTTACGTTCGACTAGTTGGTTTAAGGTTAACCGTCCGTTAGGTACGTACGTACGGTTCCGTTACTACCTTCCGTAGGACCGACCGGAAGGTACTAAACCTAAACCTTCGTACGTACGTAGGACGGGGCCGGGTCGGTCGGAACGGAACGGGCCGACTAAGT

>gnl|SRA|SRR050667.385.4
GTTCGTCGGGCCGTGTCTCTACGTACCCAATGGGTCGGTCCCGTTCACCCTCTACCCAGGCCGGCTATGGGATCGGTCGGCCCTTCCGGTCGGGCCGTTACCCTACACCGAAACTAGCTAACCTCCAACGTCCGGGTACCAATCTTTAATACCGACCGGAGTTTTCACACCGAGCCAATGCACGCTCTGTGCGGCTTAATGTCGGTTATTAAGGCAGTCCATTCTAGACGTGTATCCCCTGTATAGCCAGGTACCACGCGTACTCACCCGTACCGTCCACTAAGAACCAAGTCTAAATCTGCCGAACCGCTTCTAATAAAGGTTCCGTTCGACTTAGCCATGTGTTAAGCTACGCCGCCAGCGTCAGTCCTGAAGCCAGGATCGAAACTC

>gnl|SRA|SRR050694.13199.4
GTTAGTCGGACGTGTCTCACGTTACCATTGGTCGGGGGACCTTCCTCTCAGGAACCCCGTACCGATCGTAGCCTTGGTGGGCCGTTACCCGGCCGAACCTTACGCTAACTACGGACGCGAAGCCAATCCCGTCGCCGGCCCGTAACTTTCCAACGGAGTACCCAATTGGACGGGTCCTCCGTCCCTAATCGGGGATTAGTCGGACGTTTCCACCGGTTGTCCCGGGCAACGGGCATCGGTTCACTCACCGTCGTTACGACTACCTTCCGGCCGGTCGCCGCCGAGAACGTACTTGCTTGCCCCGCCGGTCCTCGCCCCTTACGGAACTTTAGGAACATAGGTCGTTTAACGTCCCTTGTCCGTCTA

Hi there,

So when we classify those three sequences using 1.30 they classify fine - however, we get the warning:

[WARNING]: mothur reversed some your sequences for a better classification. If you would like to take a closer look, please check …/…/classify/test.pds.wang.flip.accnos for the list of the sequences.

Starting with version 1.24 we began to flip sequences if we got a bad classification to see if they were any better in the opposite orientation. It doesn’t sound like this is happening for you. Is is possible that you have an older version of mothur that you’re using? You can double check the version number by running mothur and looking at the top of the screen. Sometimes people pull down a new version, but still end up using the old version because of how they have their paths set up.

Pat

1 Like

Well, I’m pretty sure I’m using v.1.30
Below is the log file I get.

There’s an “is bad” line, and the sequences can’t be classified.
There must be cases that trigger this “is bad” message, and I would like to know what those cases are.
Aside from that, should I trust the classification results I’m getting from the other (assumed to be normal, since they don’t come with a “is bad” message) sequences?
Thank you.

Windows version

Running 64Bit Version

mothur v.1.30.2
Last updated: 4/19/2013

by
Patrick D. Schloss

Department of Microbiology & Immunology
University of Michigan
pschloss@umich.edu
http://www.mothur.org

When using, please cite:
Schloss, P.D., et al., Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol, 2009. 75(23):7537-41.

Distributed under the GNU General Public License

Type 'help()' for information on the commands that are available

Type 'quit()' to exit program
Interactive Mode


mothur > classify.seqs(fasta=../../../Metagenomics/datasets/candidates/test/0422/part1.pick.good.unique.align, name=../../../Metagenomics/datasets/candidates/test/0422/part1.pick.good.names, taxonomy=../../../Tools/RDP_v9/trainset9_032012.pds.tax, template=../../../Tools/RDP_v9/trainset9_032012.pds.fasta, cutoff=60)

Using 1 processors.
Reading template taxonomy...     DONE.
Reading template probabilities...     DONE.
It took 26 seconds get probabilities. 
Classifying sequences from ../../../Metagenomics/datasets/candidates/test/0422/part1.pick.good.unique.align ...
gnl|SRA|SRR050658.2906.4is bad.
[WARNING]: gnl|SRA|SRR050658.2906.4 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
gnl|SRA|SRR050667.385.4is bad.
[WARNING]: gnl|SRA|SRR050667.385.4 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
gnl|SRA|SRR050694.13199.4is bad.
[WARNING]: gnl|SRA|SRR050694.13199.4 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
Processing sequence: 3

It took 0 secs to classify 3 sequences.

Reading ../../../Metagenomics/datasets/candidates/test/0422/part1.pick.good.names...  Done.

It took 0 secs to create the summary file for 3 sequences.


Output File Names: 
../../../Metagenomics/datasets/candidates/test/0422/part1.pick.good.unique.pds.wang.taxonomy
../../../Metagenomics/datasets/candidates/test/0422/part1.pick.good.unique.pds.wang.tax.summary

Weird, we think this might be a windows thing. To help track it down can you compress and email part1.pick.good.unique.align to mothur.bugs@gmail?

I’ve just emailed the files (subject: classify.seqs gives “is bad” warning)
I hope this strange behavior doesn’t affect the other classified sequences…

Thanks for sending your logfile and sequences. The “is bad” error message is given when a sequence has no kmers. The alignment is really bad. The aligned sequences all have less than 8 bases, hence no kmers of length 8 and mothur is unable to classify them. I will modify the “is bad” warning to explain what is happening.