cluster.split failure

Dilhanide · June 30, 2016, 12:10pm

Dear Pat,
I am new mothur user. I know you have answered this type of question before ,but I really need to get some suggestions for my sequence run issue. As a summary,I sequenced V3-V4 region of 16S rRNA gene (soil DNA) through Illumina Miseq sequence (paired end reads).
Initially I had 6.8 million reads (6842673)
After unique.seqs run I had 1917220 unique sequences
Then I aligned my primer pair (338F and S4 )with 16S rRNA refernce sequence and this customised alignment was aligned back with the silva.bacteria.fasta which reduced number of columns from 50,000 to 17011…
After, pre.cluster run, the remained unique sequences were 776405.
Then, Chimera.uchime was run for nearly three days and followed all Miseq SOP guide for analyse sequences except error rate assess since I haven’t had a sequenced mock community.

Then, cluster.split command ran for 8 days but ended up with run failure. The command for cluster.split as follows,
mothur > cluster.split(fasta=Dilhani.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta, count=Dilhani.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table, taxonomy=Dilhani.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.taxonomy, splitmethod=classify, taxlevel=4, cutoff=0.15, processors=10)

A top part of log file,
mothur > cluster.split(fasta=Dilhani.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta, count=Dilhani.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table, taxonomy=Dilhani.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.taxonomy, splitmethod=classify, taxlevel=4, cutoff=0.15, processors=10)

Using 10 processors.
Using splitmethod fasta.
Splitting the file…
/******************************************/
Running command: dist.seqs(fasta=Dilhani.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta.0.temp, processors=10, cutoff=0.155)

Using 10 processors.
/******************************************/

Output File Names:
Dilhani.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta.0.dist

It took 12521 seconds to calculate the distances for 19342 sequences.
/******************************************/
Running command: dist.seqs(fasta=Dilhani.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta.1.temp, processors=10, cutoff=0.155)

Using 10 processors.
/******************************************/

Output File Names:
Dilhani.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta.1.dist

It took 561 seconds to calculate the distances for 23947 sequences.
/******************************************/
Running command: dist.seqs(fasta=Dilhani.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta.2.temp, processors=10, cutoff=0.155)

Using 10 processors.
/******************************************/

Output File Names:
Dilhani.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta.2.dist

It took 143836 seconds to calculate the distances for 60467 sequences.
/******************************************/
Running command: dist.seqs(fasta=Dilhani.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta.3.temp, processors=10, cutoff=0.155)

Using 10 processors.
/******************************************/

Then it was end as follows, Clustering Dilhani.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta.0.dist ********************#****#****#****#****#****#****#****#****#****#****# Reading matrix: ||||||||||||||||||||||||||||||||||||||||||||||||||| ***********************************************************************

Clustering Dilhani.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta.14.dist
********************###########
Reading matrix: |||||||||||||||||||||||||||||||||||||||||||||||||||

Clustering Dilhani.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta.8.dist
********************###########
Reading matrix: |||||||||||||||||||||||||||||||||||||||||||||||||||

Clustering Dilhani.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta.10.dist
********************###########
Reading matrix: |||||||||||||||||||||||||||||||||||||||||||||||||||

Clustering Dilhani.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta.11.dist
Cutoff was 0.155 changed cutoff to 0.06
********************###########
Reading matrix: |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

Clustering Dilhani.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta.2.dist
Cutoff was 0.155 changed cutoff to 0.06
Cutoff was 0.155 changed cutoff to 0.06
Cutoff was 0.155 changed cutoff to 0.06
Cutoff was 0.155 changed cutoff to 0.06
[ERROR]: Could not open 24778.temp

My questions, I ran this using server in university, the error resulted from running out of memory but there was 64Gb available to this process which should be more than enough.I know v3-v4 sequence read is not much good, but however I have to analyse my data. Any suggestions you can give me really appreciated.

Thank you.
Regards,
Dilhanide

pschloss · June 30, 2016, 6:11pm

I would encourage you to use the phylotype-based approach for reasons described here: http://blog.mothur.org/2014/09/11/Why-such-a-large-distance-matrix/

Topic		Replies	Views
cluster.split problem Theory behind mothur	1	3392	January 9, 2015
About Cluster command Commands in mothur	8	3616	June 3, 2020
cluster.split Commands in mothur	13	8688	July 15, 2013
Use cluster.split on MiSeq data Commands in mothur	15	13904	May 9, 2013
cluster and cluster.split error mothur bugs	3	5934	July 24, 2014

cluster.split failure

Related topics