classify.seqs

Hi,
I tried to classify mi Miseq archaea sequences with the greengenes databases (the new one and the old version) with the following command line:

classify.seqs(fasta=arch.trim.contigs.good.unique.good.filter.unique.precluster.pick.abund.fasta, count=arch.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.abund.count_table, reference=gg_99.pds.ng.fasta, taxonomy=gg_99.pds.tax, cutoff=80, processors=12)

But I get every time the following error message from my server:

_ModuleCmd_Load.c(204):ERROR:105: Unable to locate a modulefile for ‘mothur/1.32.1’
Your job has been killed.
This may happen if one of the followings hold :

  • you exceeded one of the queue/job limits (run time, memory, etc)
  • you (or admin) killed the job using qdel
  • something bad happened.
    Now, just in case something bad happened, here are the debug information about your job :
    total 4
    -rw-r–r-- 1 jauguet UMR5254 50 22 juil. 13:11 job_killed
    JOB INFO :
    total 40
    -rw-r–r-- 1 sgeadmin sgeadmin 34 22 juil. 13:05 pe_hostfile
    -rw-r–r-- 1 sgeadmin sgeadmin 6440 22 juil. 13:05 environment
    -rw-r–r-- 1 sgeadmin sgeadmin 2051 22 juil. 13:05 config
    -rw-r–r-- 1 sgeadmin sgeadmin 6 22 juil. 13:05 pid
    -rw-r–r-- 1 sgeadmin sgeadmin 6 22 juil. 13:05 job_pid
    -rw-r–r-- 1 sgeadmin sgeadmin 6 22 juil. 13:05 addgrpid
    -rw-r–r-- 1 jauguet UMR5254 0 22 juil. 13:05 error
    -rw-r–r-- 1 jauguet UMR5254 0 22 juil. 13:05 exit_status
    -rw-r–r-- 1 sgeadmin sgeadmin 319 22 juil. 13:11 usage
    -rw-r–r-- 1 jauguet UMR5254 6556 22 juil. 13:11 trace
    STATUS :
    TRACE :
    07/22/2014 13:05:47 [500:43604]: shepherd called with uid = 0, euid = 500
    07/22/2014 13:05:47 [500:43604]: starting up 2011.11
    07/22/2014 13:05:47 [500:43604]: setpgid(43604, 43604) returned 0
    07/22/2014 13:05:47 [500:43604]: do_core_binding: “binding” parameter not found in config file
    07/22/2014 13:05:47 [500:43604]: no prolog script to start
    07/22/2014 13:05:47 [500:43604]: parent: forked “job” with pid 43606
    07/22/2014 13:05:47 [500:43604]: parent: job-pid: 43606
    07/22/2014 13:05:47 [500:43606]: child: starting son(job, /var/spool/sge/ceri025/job_scripts/6709200, 0);
    07/22/2014 13:05:47 [500:43606]: pid=43606 pgrp=43606 sid=43606 old pgrp=43604 getlogin()=
    07/22/2014 13:05:47 [500:43606]: reading passwd information for user ‘jauguet’
    07/22/2014 13:05:47 [500:43606]: setosjobid: uid = 0, euid = 500
    07/22/2014 13:05:47 [500:43606]: setting limits
    07/22/2014 13:05:47 [500:43606]: RLIMIT_CPU setting: (soft 18446744073709551615(INFINITY), hard 18446744073709551615(INFINITY)) resulting: (soft 18446744073709551615(INFINITY), hard 18446744073709551615(INFINITY))
    07/22/2014 13:05:47 [500:43606]: RLIMIT_FSIZE setting: (soft 18446744073709551615(INFINITY), hard 18446744073709551615(INFINITY)) resulting: (soft 18446744073709551615(INFINITY), hard 18446744073709551615(INFINITY))
    07/22/2014 13:05:47 [500:43606]: RLIMIT_DATA setting: (soft 18446744073709551615(INFINITY), hard 18446744073709551615(INFINITY)) resulting: (soft 18446744073709551615(INFINITY), hard 18446744073709551615(INFINITY))
    07/22/2014 13:05:47 [500:43606]: RLIMIT_STACK setting: (soft 268435456, hard 268435456) resulting: (soft 268435456, hard 268435456)
    07/22/2014 13:05:47 [500:43606]: RLIMIT_CORE setting: (soft 0, hard 0) resulting: (soft 0, hard 0)
    07/22/2014 13:05:47 [500:43606]: RLIMIT_MEMLOCK setting: (soft 18446744073709551615(INFINITY), hard 18446744073709551615(INFINITY)) resulting: (soft 18446744073709551615(INFINITY), hard 18446744073709551615(INFINITY))
    07/22/2014 13:05:47 [500:43606]: RLIMIT_VMEM/RLIMIT_AS setting: (soft 8589934592, hard 8589934592) resulting: (soft 8589934592, hard 8589934592)
    07/22/2014 13:05:47 [500:43606]: RLIMIT_RSS setting: (soft 18446744073709551615(INFINITY), hard 18446744073709551615(INFINITY)) resulting: (soft 18446744073709551615(INFINITY), hard 18446744073709551615(INFINITY))
    07/22/2014 13:05:47 [500:43606]: setting environment
    07/22/2014 13:05:47 [500:43606]: Initializing error file
    07/22/2014 13:05:47 [500:43606]: switching to intermediate/target user
    07/22/2014 13:05:47 [13532:43606]: closing all filedescriptors
    07/22/2014 13:05:47 [13532:43606]: further messages are in “error” and “trace”
    07/22/2014 13:05:47 [13532:43606]: now running with uid=13532, euid=13532
    07/22/2014 13:05:47 [13532:43606]: execvp(/var/spool/sge/ceri025/job_scripts/6709200, “/var/spool/sge/ceri025/job_scripts/6709200”)
    07/22/2014 13:11:04 [500:43604]: wait3 returned -1
    07/22/2014 13:11:04 [500:43604]: forward_signal_to_job(): mapping signal 20 TSTP
    07/22/2014 13:11:04 [500:43604]: mapped signal TSTP to signal KILL
    07/22/2014 13:11:04 [500:43604]: queued signal KILL
    07/22/2014 13:11:04 [500:43604]: /SGE/GE2011.11/inra/tools/terminate_method.sh $job_owner $job_pid -> overriddes kill(-43606, KILL)
    07/22/2014 13:11:04 [500:52460]: starting terminate_method command: /SGE/GE2011.11/inra/tools/terminate_method.sh jauguet 43606
    07/22/2014 13:11:04 [13532:52460]: start_as_command: pre_args_ptr[0] = argv0; “/SGE/GE2011.11/inra/tools/terminate_method.sh jauguet 43606” shell_path = “/bin/bash”
    07/22/2014 13:11:04 [13532:52460]: execvp(/bin/bash, “/SGE/GE2011.11/inra/tools/terminate_method.sh jauguet 43606” “-c” “/SGE/GE2011.11/inra/tools/terminate_method.sh jauguet 43606”)
    07/22/2014 13:11:05 [500:43604]: wait3 returned -1
    07/22/2014 13:11:05 [500:43604]: forward_signal_to_job(): mapping signal 20 TSTP
    07/22/2014 13:11:05 [500:43604]: mapped signal TSTP to signal KILL
    07/22/2014 13:11:05 [500:43604]: queued signal KILL
    07/22/2014 13:11:05 [500:43604]: /SGE/GE2011.11/inra/tools/terminate_method.sh $job_owner $job_pid -> overriddes kill(-43606, KILL)
    07/22/2014 13:11:05 [500:43604]: Skipped start of suspend: previous command (pid= 52460) is still active
    07/22/2014 13:11:06 [500:43604]: wait3 returned -1
    07/22/2014 13:11:06 [500:43604]: forward_signal_to_job(): mapping signal 20 TSTP
    07/22/2014 13:11:06 [500:43604]: mapped signal TSTP to signal KILL
    07/22/2014 13:11:06 [500:43604]: queued signal KILL
    07/22/2014 13:11:06 [500:43604]: /SGE/GE2011.11/inra/tools/terminate_method.sh $job_owner $job_pid -> overriddes kill(-43606, KILL)
    07/22/2014 13:11:06 [500:43604]: Skipped start of suspend: previous command (pid= 52460) is still active
    07/22/2014 13:11:06 [500:43604]: wait3 returned 43606 (status: 15; WIFSIGNALED: 1, WIFEXITED: 0, WEXITSTATUS: 0)
    07/22/2014 13:11:06 [500:43604]: job exited with exit status 0
    07/22/2014 13:11:09 [500:43604]: wait3 returned 52460 (status: 0; WIFSIGNALED: 0, WIFEXITED: 1, WEXITSTATUS: 0)
    07/22/2014 13:11:09 [500:43604]: reaped terminate command
    07/22/2014 13:11:09 [500:43604]: reaped “job” with pid 43606
    07/22/2014 13:11:09 [500:43604]: job exited due to signal
    07/22/2014 13:11:09 [500:43604]: job signaled: 15
    07/22/2014 13:11:09 [500:43604]: now sending signal KILL to pid -43606
    07/22/2014 13:11:09 [500:43604]: writing usage file to “usage”
    07/22/2014 13:11:09 [500:43604]: no tasker to notify
    07/22/2014 13:11:09 [500:43604]: parent: forked “epilog” with pid 52620
    07/22/2014 13:11:09 [500:43604]: using signal delivery delay of 120 seconds
    07/22/2014 13:11:09 [500:43604]: parent: epilog-pid: 52620
    07/22/2014 13:11:09 [500:52620]: child: starting son(epilog, /SGE/GE2011.11/inra/tools/epilog_verbose.sh, 0);
    07/22/2014 13:11:09 [500:52620]: pid=52620 pgrp=52620 sid=52620 old pgrp=43604 getlogin()=
    07/22/2014 13:11:09 [500:52620]: reading passwd information for user ‘jauguet’
    07/22/2014 13:11:09 [500:52620]: setting limits
    07/22/2014 13:11:09 [500:52620]: setting environment
    07/22/2014 13:11:09 [500:52620]: Initializing error file
    07/22/2014 13:11:09 [500:52620]: switching to intermediate/target user
    07/22/2014 13:11:09 [13532:52620]: closing all filedescriptors
    07/22/2014 13:11:09 [13532:52620]: further messages are in “error” and “trace”
    07/22/2014 13:11:09 [13532:52620]: using “/bin/bash” as shell of user “jauguet”
    07/22/2014 13:11:09 [13532:52620]: now running with uid=13532, euid=13532
    07/22/2014 13:11:09 [13532:52620]: execvp(/SGE/GE2011.11/inra/tools/epilog_verbose.sh, “/SGE/GE2011.11/inra/tools/epilog_verbose.sh”)
    ERROR :_


The mothur logfile is as follow:

mothur > classify.seqs(fasta=arch.trim.contigs.good.unique.good.filter.unique.precluster.pick.abund.fasta, count=arch.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.abund.count_table, reference=gg_99.pds.ng.fasta, taxonomy=gg_99.pds.tax, cutoff=80, processors=12)
_Using 12 processors.
Generating search database… DONE.
It took 89 seconds generate search database.

Reading in the gg_99.pds.tax taxonomy… DONE.
Calculating template taxonomy tree… DONE.
Calculating template probabilities… DONE.
It took 317 seconds get probabilities.
Classifying sequences from arch.trim.contigs.good.unique.good.filter.unique.precluster.pick.abund.fasta …
[WARNING]: M02233_62_000000000-A9GLW_1_1109_23923_13093 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M02233_50_000000000-A92MB_1_1105_3231_10647 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M02233_50_000000000-A92MB_1_1118_12817_9845 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M02233_50_000000000-A92MB_1_2118_24109_16291 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M02233_50_000000000-A92MB_1_2112_6934_15906 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M02233_50_000000000-A92MB_1_1104_8028_13594 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M02233_62_000000000-A9GLW_1_2117_8803_8563 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M02233_62_000000000-A9GLW_1_1106_27288_9106 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M02233_50_000000000-A92MB_1_2113_14497_12782 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M02233_50_000000000-A92MB_1_1104_15585_12202 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M02233_62_000000000-A9GLW_1_2117_26720_12683 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M02233_50_000000000-A92MB_1_2118_16821_22765 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M02233_62_000000000-A9GLW_1_2117_26239_9001 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M02233_50_000000000-A92MB_1_2112_3567_13138 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M02233_50_000000000-A92MB_1_1116_10699_11513 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M02233_50_000000000-A92MB_1_2118_16255_8301 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M02233_50_000000000-A92MB_1_2112_17899_18506 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M02233_50_000000000-A92MB_1_1103_19986_24223 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M02233_50_000000000-A92MB_1_2112_17799_24535 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences._

I do not know if I’m doing something wrong but It worked like a charm with the Silva database. I precise that I work in batch mode with version 1.32.1. Sequences have been aligned with the greegenes database. The number of sequences should not be a problem since I have only 12120 unique sequences. Is the greengenes database so computational demanding that I have to increase the number of processors and RAM for the job?
Thank in advance for the help.

JC

1 Like

A few things…

  1. I’m not sure what’s going on with the “Unable to locate a module” error and I can’t make sense of why the scheduler killed it. You should check with your sys admin to see if they can translate what’s going on - that will help a lot. The greengenes database has 200k sequences in it whereas the silva database has 15k sequences. It’s not exactly a 10-fold increase in memory, but it is a bit more.

  2. The warning messages just indicate what they say - it couldn’t classify the sequence to the kingdom level and so you should remove them afterwards with remove.lineage.

  3. I’m not sure whether the reference files you are using are the same as the ones we have posted. Ours have names like gg_13_5_99.pds_gg.tax, gg_13_5_99.fasta, and gg_13_5_99.gg.tax

  4. I don’t think it will matter, but you might try to upgrade to mothur v. 1.33

Thanks Pat for your reply.
Seems that it is a memory problem but I cannot stop thinking what would be good practice when the number of unique sequences is too high (for any reasons) for Miseq data and analyses are hampered by computational power (specially at the dist.seqs and cluster steps).
All options (omitting phylotype analysis, but is it a valid option?) imply a reduction of the data set.
One solution is subsampling before the dist.seq step but I’ve seen another approach (not completely sure it is correct) using the split.abund function with cutoff = 1 in order to remove singleton sequences. Am I wrong if i say that with this second option, I may get rid of (as an extreme example) abundant OTU’s regrouping these singleton sequences?
From what I have read on this forum and others, we are quite a lot in having the same thoughts…
Cheers

For classify.seqs the memory problem is not related to the input dataset - it reads in one sequence at a time. The memory limitation is due to reading in the database.

Pat

1 Like