Hi all,
I’m using mothur:
Linux version
Using ReadLine
Running 64Bit Version
mothur v.1.39.5
Last updated: 3/20/2017
I’m trying to cluster a large environmental, highly diverse, highly multiplexed, nonaligned fungal ITS file with
cluster(fasta=MiSeq.pick.pick.fasta, count=MiSeq.pick.pick.count_table, method=agc, cutoff=0.03, processors=32)
Clustering progresses normally, outputs two files but then hangs.
output in mothur home directory:
MiSeq.pick.pick.fasta.sorted.fasta.temp.clustered.uc
MiSeq.pick.pick.fasta.sorted.fasta.temp.clustered.log
The log contains the vsearch log output:
vsearch v2.3.4_linux_x86_64, 252.2GB RAM, 32 cores
/mnt/shared/users/tf40403/mothur/mothur/vsearch --maxaccepts=16 --threads=32 --usersort --id=0.97 --minseqlength=30 --wordlength=8 --uc=MiSeq.pick.pick.fasta.sorted.fasta.temp.clustered.uc --cluster_smallmem=MiSeq.pick.pick.fasta.sorted.fasta.temp --maxrejects=64 --strand=both --log=MiSeq.pick.pick.fasta.sorted.fasta.temp.clustered.log --sizeorder
Started Sun Jul 2 15:14:57 20171629881570 nt in 5111917 seqs, min 150, max 403, avg 319
Alphabet nt
Word width 8
Word ones 8
Spaced No
Hashed No
Coded No
Stepped No
Slots 65536 (65.5k)
DBAccel 100%
Clusters: 753615 Size min 1, max 404079, avg 6.8
Singletons: 706642, 13.8% of seqs, 93.8% of clusters
Finished Sun Jul 2 17:12:51 2017
Elapsed time 117:54
Max memory 6.8GB
I’ve used the cluster- method=agc, module of this mothur installation before and it worked fine.
I’m not sure what is going on here:
Either, the size and complexity of the dataset completely prevents the conversion of the vsearch output to mothur list file or slows it down to snail pace.
Any comments are greatly appreciated.
Tom