tree.shared

Diegoassis · July 3, 2013, 7:32pm

Hi,
I would like to run the command tree.shared to see the similarity between 8 samples. All samples were sequenced using Ion Torrent. I have 237,000 sequences. I’m trying to use this workflow:

unique.seqs
align.seqs
filter.seqs
dist.seqs (cutoff 0.1, ouput=lt)
cluster.seqs (cutoff 0.1, furthest method)
make.shared
tree.shared

However, the step 4 (dist.seqs) is creating an output of 58 gigabytes of size and the cluster analysis is not running well (my computer cannot read the matrix entirely). Could you help me or suggest anything?

Below, I show you some data that could be useful.
Metadata
Number of samples: 7
Number of sequences: 237,000
Average lenght: 150 bp
Sequencing plataform: Ion Torrent, 318 chip
Other informations: Barcode

Computer features
Ram memory: 16 GB
Processors: 1 Intel Xeon 2.5 Ghz

Best regards,

pschloss · July 9, 2013, 11:58am

The problem is that you are are running out of RAM. The cause is that you are using IonTorrent, which turns out to be a horrible way to sequence 16S genes because it has an incredibly high sequencing error rate. Each error essentially creates a new unique sequence. You might try following the 454 SOP w/ the quality scores a bit closer (trim.seqs , pre.cluster, chimera.uchime, etc) but I doubt it will help much.

Topic		Replies	Views
cluster.split Commands in mothur	10	10449	March 12, 2015
mothur does not complete batch script Commands in mothur	5	2083	June 8, 2016
Problems handling a >50 Gb distance matrix (cluster command) mothur bugs	12	14794	October 18, 2013
Stuck at clustering, its running for more than a week Commands in mothur	6	432	January 10, 2024
Produce too large amount of data when running dist.seqs Commands in mothur	8	7732	October 18, 2013

tree.shared

Related topics