I am having issues when using cluster with the vsearch (method=agc). My fasta file is aligned and preclustered. It is huge (~7M sequences). Vsearch finishes clustering and produces the .uc and .log files. The largest cluster has over 3.7M sequences. The list is not created and it looks like it is just sitting there doing nothing.
Here is what is in the log file:
vsearch v2.17.1_linux_x86_64, 92.9GB RAM, 32 cores
/opt/mothur/1.48.0/prebuilt//vsearch --maxaccepts=16 --threads=32 --usersort --id=0.94 --minseqlength=30 --wordlength=8 --uc=Beavers_B.trim.contigs.good.filter.good.precluster.fasta.sorted.fasta.temp.clustered.uc --cluster_smallmem=Beavers_B.trim.contigs.good.filter.good.precluster.fasta.sorted.fasta.temp --maxrejects=64 --strand=both --log=Beavers_B.trim.contigs.good.filter.good.precluster.fasta.sorted.fasta.temp.clustered.log --sizeorder
Started Wed Jan 22 11:58:40 2025
2224429547 nt in 7324114 seqs, min 240, max 602, avg 304
Alphabet nt
Word width 8
Word ones 8
Spaced No
Hashed No
Coded No
Stepped No
Slots 65536 (65.5k)
DBAccel 100%
Clusters: 9309 Size min 1, max 3790290, avg 786.8
Singletons: 4966, 0.1% of seqs, 53.3% of clusters
Finished Wed Jan 22 12:20:41 2025
Elapsed time 22:01
Max memory 6.0GB
The fasta file is 26.8 GB and the count is close to 0.8 GB. Do you think this is a memory issue?