VSEARCH error when constructing listfile

Hey again,

I’m playing with the new vsearch clustering option, and after having get.oturep and make.shared fail often, I think something might be wrong with the way cluster(method=agc) is constructing the listfile, since make.shared reports it has more sequences than the original fasta file:

complete_set.unique.good.filter.unique.precluster.pick.fasta (209,950 seqs)
complete_set.unique.good.filter.unique.precluster.denovo.uchime.pick.count_table (209,950 seqs)

#this gives the following output:

You did not set a cutoff, using 0.03.
vsearch v1.11.1_linux_x86_64, 15.6GB RAM, 8 cores
Reading file complete_set.unique.good.filter.unique.precluster.pick.fasta.sorted.fasta.temp 100%
89705266 nt in 206950 seqs, min 397, max 453, avg 433 # <- vsearch reads the correct number of sequences
Clusters: 32062 Size min 1, max 1923, avg 6.5
Singletons: 21977, 10.6% of seqs, 68.5% of clusters
It took 512 seconds to cluster
Output File Names:

  • Then run make.shared
    and this returns a long list of
    [ERROR]: S23S244_6302 is in your listfile and not in your count file. Please correct.
    Your group file contains 206950 sequences and list file contains 239012 sequences. Please correct.

  • And now the listfile is reportedly 239012 sequences. None of the files in the pipeline have this precise number, and I checked nothing else is being used with get.current().

I also checked the presence of some of the sequences in both files, and they are both there as far as I can tell.


Thanks for reporting this bug. I have fixed it and the latest release is available here, https://github.com/mothur/mothur/releases/tag/v1.37.3.