About Cluster command

Hi Everyone:

I tried to run my Miseq data with the command cluster to cluster my sequences into OTUs but this step always failed with no reason. Thus I would like to ask for some help for it. Before that, I have done the dist.seqs command as the Miseq SOP described. I followed almost every steps that described in the Miseq SOP except the assessing error rate because I don’t have the database for assessing it. Thus I used my fasta file from the step before the error assessing step to run the dist.seqs. The command is as below (I do have 12 processors so I used them all):

dist.seqs(fasta=file.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta, cutoff=0.20, processors=12)

It took about 3.7 days to finish it and then generated a file:

file.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.dist

Then I used the cluster command to run the following step:

cluster(column=file.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.dist, count=file.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table)

However, it always gave me something like this:

********************###########
Reading matrix: ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||Killed

Also, it even quitted the mothur environment by itself with no other error message.

Since it got some problems like this, I also tried several times for the alternative ways, which is the cluster.split command. What I did was:

cluster.split(fasta=file.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta, count=file.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table, taxonomy=file.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.taxonomy, splitmethod=classify, taxlevel=4, cutoff=0.15, processors=8)

That was like what the Miseq SOP described.

After several days of running, it gave me a lot of dist files and a lot of temp files. However, it didn’t seem to stop yet and gave me hundreds of error messages like:

[ERROR]: Could not open file.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table.113.temp

Then it stopped here for many days already. If I take a look of the conditions of the server, only one CPU is running mothur still, but nothing really changed anymore. (when it started to run, the 8 CPU were all running) Also, I don’t think it’s right to see so many error messages there and it seemed to stop.

I have run these two steps (cluster and cluster.split) so many times but they just didn’t work. I have been stuck in this step for more than a half month. Could someone help me and guide me what I can do for it to generate a list file?

Thank you so much!

Best Wishes,

Chih

What region are you sequencing and on what platform? How much RAM does your computer have?

Hi, Patrick

The region that I sequenced was the Bacterial 16S V3-4 region on Miseq. The information of the RAM of my server is as below:

MemTotal: 20551516 kB
MemFree: 19418348 kB
MemAvailable: 19706404 kB
Buffers: 71800 kB
Cached: 333928 kB
SwapCached: 3948 kB
Active: 373372 kB
Inactive: 51604 kB
Active(anon): 7400 kB
Inactive(anon): 11888 kB
Active(file): 365972 kB
Inactive(file): 39716 kB
Unevictable: 32 kB
Mlocked: 32 kB
SwapTotal: 20959228 kB
SwapFree: 20659160 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 18680 kB
Mapped: 28288 kB
Shmem: 76 kB
Slab: 254984 kB
SReclaimable: 215228 kB


So far, I still didn't shut down the cluster.split commands so it is still running something. However, nothing changed since then.

Thank you for your consideration!!

Chih

I suspect you’re running into memory issues since your reads do not fully overlap. I’d encourage you to read this…

http://blog.mothur.org/2014/09/11/Why-such-a-large-distance-matrix%3F/

Thanks:)

Hi,

I have come across with the same problem when I run the cluster.split, it was killed with [ERROR]: Could not open 16746.temp…
The memory I have used is 500gb, with 20 cores. Is it the memory issue?

Thanks,
Junnie

Please read the blog post that I linked to above. It’s a memory problem that is caused by poor sequence quality.

Pat

1 Like

I’m running with the same issue. I’m using 16 cores, 32 GB RAM. And the mentioned link is not working.

DC7

1 Like