classify.seqs in MiSeq SOP

Hello!

I am going through the MiSeq SOP (using the data also provided in the SOP) to get a good handle of how it works. However, I seem to have hit a wall at the classify.seqs step and I’m not sure why. I will enter the command as written in the SOP and end up with a long list of “[WARNING]: M00967_43_000000000-A3JHG_1_1105_15500_1801 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.” I end up with a summary file for 2108 sequences. When I go on to the remove.lineage command, I get another list of “Removing group: F3D0_S188 because all sequences have been removed.” I’ll try to go on to the next command and I’ll get, “[ERROR]: MiSeq_SOP/stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta is blank. Please correct.
Error in reading your fastafile, at position -1. Blank name.”

I’ll also add that when I did the remove.seqs command a few steps ahead of this part, my summary.seqs table did not have the same values as in the MiSeq SOP. My total # of seqs was 118080 instead of 118150.

So, I’m not sure why my numbers start to drift from the SOP at this point and not sure why my classify.seqs command is not turning out how its supposed to.

I am still trying to wrap my head around this, so my apologies if the solution to this is very simple. Any advice would be greatly appreciated. Thanks!

P

Can you post the exact commands you ran. Specifically for classify.seqs()

Cheers
Richard

Hi Richard,

Here are the commands I did starting from the remove.seqs command:

mothur > remove.seqs(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.fasta, accnos=stability.trim.contigs.good.unique.good.flter.unique.precluster.denovo.vsearch.accnos)
Unable to open stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.accnos. Trying default /Users/priyamistry/Desktop/MiSeq_SOP/stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.accnos
Unable to open stability.trim.contigs.good.unique.good.filter.unique.precluster.fasta. Trying default /Users/priyamistry/Desktop/MiSeq_SOP/stability.trim.contigs.good.unique.good.filter.unique.precluster.fasta
[WARNING]: This command can take a namefile and you did not provide one. The current namefile is /Users/priyamistry/Desktop/MiSeq_SOP/stability.trim.contigs.good.names which seems to match /Users/priyamistry/Desktop/MiSeq_SOP/stability.trim.contigs.good.unique.good.filter.unique.precluster.fasta.
Removed 3324 sequences from your fasta file.

Output File Names:
/Users/priyamistry/Desktop/MiSeq_SOP/stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta


[b]mothur > summary.seqs(fasta=current, count=current)[/b] Using /Users/priyamistry/Desktop/MiSeq_SOP/stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.count_table as input file for the count parameter. Using /Users/priyamistry/Desktop/MiSeq_SOP/stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta as input file for the fasta parameter.

Using 1 processors.

Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 376 249 0 3 1
2.5%-tile: 1 376 252 0 3 2953
25%-tile: 1 376 252 0 4 29521
Median: 1 376 252 0 4 59041
75%-tile: 1 376 253 0 5 88561
97.5%-tile: 1 376 253 0 6 115129
Maximum: 1 376 256 0 8 118080
Mean: 1 376 252.464 0 4.37569

of unique seqs: 2108

total # of seqs: 118080

Output File Names:
/Users/priyamistry/Desktop/MiSeq_SOP/stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.summary

It took 0 secs to summarize 118080 sequences.

mothur > classify.seqs(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=stability.trim.contigs.good.unique.ood.filter.unique.precluster.denovo.vsearch.pick.count_table, reference=trainset9_032012.pds.fasta, taxonomy=trainset9_032012.pds.tax, cutoff=80)

Unable to open stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta. Trying default /Users/priyamistry/Desktop/MiSeq_SOP/stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta
Unable to open stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.count_table. Trying default /Users/priyamistry/Desktop/MiSeq_SOP/stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.count_table

Using 1 processors.
Unable to open trainset9_032012.pds.fasta. Trying default /Users/priyamistry/Desktop/MiSeq_SOP/trainset9_032012.pds.fasta
Unable to open trainset9_032012.pds.tax. Trying default /Users/priyamistry/Desktop/MiSeq_SOP/trainset9_032012.pds.tax
Generating search database… DONE.
It took 8 seconds generate search database.

Reading in the /Users/priyamistry/Desktop/MiSeq_SOP/trainset9_032012.pds.tax taxonomy… DONE.
Calculating template taxonomy tree… DONE.
Calculating template probabilities… DONE.
It took 18 seconds get probabilities.
Classifying sequences from /Users/priyamistry/Desktop/MiSeq_SOP/stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta …
[WARNING]: M00967_43_000000000-A3JHG_1_1101_13234_1983 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.

In your classify.seqs command you’ve used:

mothur > classify.seqs(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=stability.trim.contigs.good.unique.ood.filter.unique.precluster.denovo.vsearch.pick.count_table, reference=trainset9_032012.pds.fasta, taxonomy=trainset9_032012.pds.tax, cutoff=80)

There is a typo in the count file name. Try correcting it and see if that fixes your problem.

Cheers
Richard

Hi Richard,

I tried it again with the correction (I think when the command is too long it cuts out a letter and moves to the next line… so it’s all still there). I got the same error unfortunately:

mothur > remove.seqs(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.fasta, accnos=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.accnos)
Unable to open stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.accnos. Trying default /Users/priyamistry/Desktop/MiSeq_SOP/stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.accnos
Unable to open stability.trim.contigs.good.unique.good.filter.unique.precluster.fasta. Trying default /Users/priyamistry/Desktop/MiSeq_SOP/stability.trim.contigs.good.unique.good.filter.unique.precluster.fasta
[WARNING]: This command can take a namefile and you did not provide one. The current namefile is /Users/priyamistry/Desktop/MiSeq_SOP/stability.trim.contigs.good.names which seems to match /Users/priyamistry/Desktop/MiSeq_SOP/stability.trim.contigs.good.unique.good.filter.unique.precluster.fasta.
Removed 3324 sequences from your fasta file.

Output File Names:
/Users/priyamistry/Desktop/MiSeq_SOP/stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta


[b]mothur > summary.seqs(fasta=current, count=current)[/b] Using /Users/priyamistry/Desktop/MiSeq_SOP/stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.count_table as input file for the count parameter. Using /Users/priyamistry/Desktop/MiSeq_SOP/stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta as input file for the fasta parameter.

Using 1 processors.

Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 376 249 0 3 1
2.5%-tile: 1 376 252 0 3 2953
25%-tile: 1 376 252 0 4 29521
Median: 1 376 252 0 4 59041
75%-tile: 1 376 253 0 5 88561
97.5%-tile: 1 376 253 0 6 115129
Maximum: 1 376 256 0 8 118080
Mean: 1 376 252.464 0 4.37569

of unique seqs: 2108

total # of seqs: 118080

Output File Names:
/Users/priyamistry/Desktop/MiSeq_SOP/stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.summary

It took 0 secs to summarize 118080 sequences.

mothur > classify.seqs(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.count_table, reference=trainset9_032012.pds.fasta, taxonomy=trainset9_032012.pds.tax, cutoff=80)
Unable to open stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta. Trying default /Users/priyamistry/Desktop/MiSeq_SOP/stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta
Unable to open stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.count_table. Trying default /Users/priyamistry/Desktop/MiSeq_SOP/stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.count_table

Using 1 processors.
Unable to open trainset9_032012.pds.fasta. Trying default /Users/priyamistry/Desktop/MiSeq_SOP/trainset9_032012.pds.fasta
Unable to open trainset9_032012.pds.tax. Trying default /Users/priyamistry/Desktop/MiSeq_SOP/trainset9_032012.pds.tax
Generating search database… DONE.
It took 9 seconds generate search database.

Reading in the /Users/priyamistry/Desktop/MiSeq_SOP/trainset9_032012.pds.tax taxonomy… DONE.
Calculating template taxonomy tree… DONE.
Calculating template probabilities… DONE.
It took 19 seconds get probabilities.
Classifying sequences from /Users/priyamistry/Desktop/MiSeq_SOP/stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta …
[WARNING]: M00967_43_000000000-A3JHG_1_1101_13234_1983 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.

P

Can you post your whole log file? (Maybe post it on pastebin and provide the link here).

Cheers
Richard

Hi Richard,

Here is the link to my log file for the run:

http://pastebin.com/JYL2Vvxb

Hopefully we can resolve this!

P

Hmmm… so i don’t see any obvious reasons why it’s not working.

What do the contents of “/Users/priyamistry/Desktop/MiSeq_SOP/stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.taxonomy” look like? Are all your sequences being classified as unkown?

Also, I can’t see it in the logfile, what version of Mothur are you running?

R

Hi!

Yes I believe all my sequences are coming out as “unknown”.

I’m running version 1.39.2 on my mac (version 10.12.3).

P

Hmmm… so mothur is deciding that none of your sequences classify as anything in the supplied reference db. This could be because your sequences are bad, or the reference is bad, or mothur is being bad. Because your following the SOP with the SOP data i really don’t know whats going wrong at this point.

My computer is currently deep into its own analysis run so I can’t run the SOP myself right now to see if it works.

One question I have is where are you running your analysis from? I see Mothur keeps complaining it can’t find the files on every function. Is mothur in your path?

R

Hi Richard,

No problem.

I put mothur in my MiSeq_SOP folder on my Desktop and everything else that was in the folder with it when i downloaded it (I think it says to do this in the SOP as well). To start the analysis I would just double click on mothur to open it and go from there. When I tried it out initially I started on terminal and opened mother through different directories etc (eventually using ./mothur), but I think I ran into issues there too. I could always try this again and see if it helps.

Definitely confused!

P

I uploaded version 1.39.3 which corrects a bug in classify.seqs. You can download it here: https://github.com/mothur/mothur/releases/tag/v1.39.3.

Thank you so much. All is good!

I’ve run into exactly this same problem. What is decidedly strange is that I can run a set of files in 1.36 with no problem at all. I take the same set and run them in 1.37, 1.38 and 1.39.2 and the output is all “unknown.” Just to be absolutely sure I have not made an error, I have repeated this twice. I normally use classify.seqs with fasta, count, template or reference and taxonomy options. I am running on a 10.12.2 Mac, but I am pretty sure that isn’t an issue, because this problem has turned up on a student’s older Mac Air. I haven’t determined whether this is also an issue in the Windows versions, but it might not be. I have another student who reported no problems and she is using 1.38 or 1.39 with Windows 10 (I think).

For the moment, I has simply run classify.seqs in 1.36 and then dumped the output back into 1.39.2. Obviously one can’t do a batch run this way, but it solved the problem.

Any ideas out there?

I did not see the reference to yet another 1.39 update.

With my seqs, the issue is clearly the default cutoff at 80. I have a lot of arkie seqs in my files and they are not getting classified well. No surprise there. By varying the cutoff from 0 up, one can control the assignments and “unknown” output. What this suggests to me is simply that one has to look carefully at the OTU assignments and use the output thoughtfully rather than just making assumptions about the classification. There might have been a change in the cutoff default after 1.36 that would explain the difference in output.

Can you try again but use 1.39.3? We had a small bug…

Pat