Using cluster.split with large data

Somebodyatthedoor · March 20, 2014, 3:40pm

Hi,

I am having a few problems running this command due to the large amount of unique sequences I have (776,819). I have tried running it on only one processor to decrease RAM usage and I have also tried running it using lower taxonomic levels but it still crashes. I am currently trying to run dist.seqs instead but I am doubtful this is going to be any use as it is taking quite a lot of time to run (although it hasn’t crashed yet after three days).

I have 64 samples run on a MiSeq with 250bp reads of the V2-V3 region. I have been following the MIseq SOP exactly up until this point.

I don’t want to have to use a phylotoype based approach unless I really have to.

Do you have any suggestions? Also, are there any analyses where it is possible to use the taxonomically assigned read data from before clustering into OTUs?

adamc83 · March 26, 2014, 9:59pm

I used:

split.abund(fasta=example.fasta, count=example.count_table, cutoff=1)

to remove singletons in a similar situation. You would then proceed to cluster on the abund.fasta and abund.count_table. It doesn’t seem to be uncommon in literature to take a similar approach as singletons are more likely to be error-containing reads anyways.

If you still have too many uniques, consider using sub.sample() to further reduce your counts.

Also, that is a suspiciously high number of uniques, even with 64 samples. What quality filtering/trimming are you doing prior to running mothur?

Somebodyatthedoor · March 31, 2014, 7:56am

Thanks a lot that completely solved my problem! Using split.abund I went from ~700,000 reads to ~25,000. Much more manageable.

Topic		Replies	Views
Use cluster.split on MiSeq data Commands in mothur	15	13961	May 9, 2013
Clustering a large dataset Commands in mothur	6	1131	February 8, 2019
** Exceeded maximum allowed command errors, quitting ** mothur bugs	6	1374	August 10, 2020
cluster.split Commands in mothur	13	8688	July 15, 2013
Error message when doing cluster.split Commands in mothur	6	5029	October 20, 2014

Using cluster.split with large data

Related topics