Dear all,
I ask your support for classify.seqs command. I am running the command in a container (until now very well functioning) containing mothur v.1.48.0, and analyzing mock communities sequences.
I give the command as following:
mothur > classify.seqs(fasta=stability.trim.contigs.unique.filter.fasta, count=stability.trim.contigs.unique.filter.count_table, reference=Databases/silva_v132.fasta, taxonomy=Databases/silva.nr_v132.tax, outputdir=XXX).
I receive this message:
Using 16 processors.
Reading template taxonomy… DONE.
Reading template probabilities… DONE.
It took 30 seconds get probabilities.
Classifying sequences from Analysis/mock_communities/mothur/stability.trim.contigs.unique.filter.fasta … M02765_22_000000000-AJBRW_1_2107_4690_15120 is bad. It has no kmers of length 8. [WARNING]: M02765_22_000000000-AJBRW_1_2107_4690_15120 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
Lines in italic repeate themselves and give me a final message of 2 error messages and 82126 warning messages. Problem is I am not understanding what is wrong with them.
NOTE: before classify.seqs I ran the commands make.files, make.contigs, unique.seqs, align.seqs, filter.seqs, unique.seqs, cluster.split with label 0.20 as suggested in SOP and manual, make.shared, and stop here in classify.otu.
(I am not interested in screen.seqs as I work on server, so no worries about my computer melting down for the effort)
Thanks in advance to those that will try to help!
Cheers
Hey there - I think you’re getting these warning messages because the sequences are bad. You would remove those if you used screen.seqs. I’m wondering where you got your protocol from since it’s been some time since we suggested a cutoff of 0.20. The screen.seqs step isn’t about conserving RAM, it’s really about removing sequences that a problems
thank you very much for your response. I took the protocol from same address you are sending me, just trying to make it a little lighter with only the commands I strictly need (having kinda fun!). I found the 0.03 cutoff in SOP, while only saw 0.20 cutoff suggested in cluster.split: I assume it is so because SOP have a special dataset to work with then
Since I have added screen.seqs I am retrieving errors that say: [ERROR]: M02765_22_000000000-AJBRW_1_1114_19191_6585 is not in your count table. Please correct.
I checked and in fact, I see this code (M02765_22_000000000-AJBRW_1_1114_19191_6585, but it is an example) in count_table but not in fasta file. Can you texplain me why or address me on this issue as well? Should I modify my fasta file manually?
You should only use 0.03 for the cutoff regardless of the dataset
I’d suggest going back to previous steps until you don’t get the error message. This typically happens because you are using the wrong fasta and count_tables - one of them has been processed more than the other.