Exceeded maximum allowed command errors, quitting

echoRG · July 24, 2020, 2:07am

I am getting this error “**** Exceeded maximum allowed command errors, quitting ****” during my cluster.split run.

This was my mothur command:
cluster.split(fasta=***.fasta, count=***.count_table, taxonomy=***.taxonomy, splitmethod=classify, taxlevel=4, cutoff=0.15, processors=64).

This was my slurm parameters:
#SBATCH --qos=shortjobs
#SBATCH --ntasks=1
#SBATCH --mem=300G
#SBATCH --cpus-per-task=64

Additional information:
1. During this run, numerous temp files were created totalling to more than 3.5 terabytes of data.
2. After my classify.seqs, I have 756420 sequences.

pschloss · July 27, 2020, 6:01pm

Hi @echoRG - can you tell us what some of the error messages are that you are getting? If you have 756K unique sequences, I’m not optimistic that this dataset will go through cluster.split. That many unique sequences is typically a symptom of high sequencing error from sequencing something other than the V4 region of the 16S rRNA gene. I’d also encourage you to set your cutoff value to 0.03 rather than 0.15.

Pat

echoRG · July 28, 2020, 2:56am

Good day, Dr. Schloss

We sequenced the V3-V4 region of the 16S rRNA gene. Is there a way to decrease the sequencing error using the filtering steps preceding cluster.split/dist.seqs? Or is there a way to decrease the number of unique sequences?

-> I don’t really get any error aside from the server telling me the run was cancelled because space have run out.
-> i followed Mothur SOP, only changing the “maxlength” and the taxonomy/alignment databased used to greengenes or silva.

pschloss · July 30, 2020, 1:00pm

You are likely running out of RAM. Again, see the link I posted in my earlier comment. The problem is that the reads do not fully overlap with each other. Short of resequencing to only get the V4 region or using the phylotype approach, I would suggest setting cutoff=0.03 and taxlevel=6. Also make sure that you’re using the most recent version of mothur.

Pat

echoRG · July 31, 2020, 5:32am

I haven’t done the cutoff=0.03 and taxlevel=6. However, during my initial run, I run it at cutoff=0.20 and taxlevel=4 that ended up with a ~3.5 terabyte temp file. I tried allocating 384 (64 x 4) Gb ram for it. However, I think the dist file is just too big that the HPC run crashed. I got an error /hpc/bin/hpc: line 116: 48191 Killed singularity exec $singularity_opts $hpc_images_folder/container "@"
*slurmstepd: error: Detected 1 oom-kill event(s) in step 38832.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler."

Is the reason why I am getting high distance matrix due to possible high sequencing error? Could it be because of the sheer volume of my input sequences? I have detailed below my samples.

We sequenced gut microbiome using primers for V3-V4 region.
We started with a paired-read L/R .fasta files that total to 15.6 Gb.
After the initial make.contig, I ended up with a 6.5 Gb fasta file with 12 million sequences.
After screen.seqs maxlength=500, I ended up with 7 million sequences.
After classify.seqs, I ended up with 700k unique sequences.

-> Can Mothur handle this large quantity of samples? If not, is there a way to split cluster.split into 5-8 tiny runs so I would have enough memory (maybe 500 Gb RAM per tiny runs) .

-> The version I am using is mothur:1.42.2.

pschloss · July 31, 2020, 1:47pm

The large distance matrix is because you have a large number of unique sequences. You have a large number of unique sequences because of the high sequencing error rate encountered when pairs of reads do not overlap. We have processed much larger datasets than this, but with the V4 region, without much issue. The problem is that you all sequenced the V3-V4 region and only have partial overlap of the reads.

If you use cutoff=0.03, then your distance matrix will be much smaller because it will only keep those distances less than 0.03. In addition, using taxlevel=6, you will split the dataset at the genus, rather than order level.

If these options don’t work, I would encourage you to use phylotype to cluster your sequences at the genus level and proceed with a shared file generated using that data.

Pat

system · August 10, 2020, 1:47pm

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Use cluster.split on MiSeq data Commands in mothur	15	13919	May 9, 2013
Error message when doing cluster.split Commands in mothur	6	5013	October 20, 2014
cluster.split Commands in mothur	4	1278	May 26, 2017
cluster.split failure Commands in mothur	1	4109	June 30, 2016
Cluster.split â€“ limit of sequences it can handle? Commands in mothur	3	2898	August 22, 2014

**** Exceeded maximum allowed command errors, quitting ****

Related topics

Exceeded maximum allowed command errors, quitting