I would like to run the command tree.shared to see the similarity between 8 samples. All samples were sequenced using Ion Torrent. I have 237,000 sequences. I’m trying to use this workflow:

  1. unique.seqs
  2. align.seqs
  3. filter.seqs
  4. dist.seqs (cutoff 0.1, ouput=lt)
  5. cluster.seqs (cutoff 0.1, furthest method)
  6. make.shared
  7. tree.shared

However, the step 4 (dist.seqs) is creating an output of 58 gigabytes of size and the cluster analysis is not running well (my computer cannot read the matrix entirely). Could you help me or suggest anything?

Below, I show you some data that could be useful.
Number of samples: 7
Number of sequences: 237,000
Average lenght: 150 bp
Sequencing plataform: Ion Torrent, 318 chip
Other informations: Barcode

Computer features
Ram memory: 16 GB
Processors: 1 Intel Xeon 2.5 Ghz

Best regards,

The problem is that you are are running out of RAM. The cause is that you are using IonTorrent, which turns out to be a horrible way to sequence 16S genes because it has an incredibly high sequencing error rate. Each error essentially creates a new unique sequence. You might try following the 454 SOP w/ the quality scores a bit closer (trim.seqs , pre.cluster, chimera.uchime, etc) but I doubt it will help much.