I’m playing with the new vsearch clustering option, and after having get.oturep and make.shared fail often, I think something might be wrong with the way cluster(method=agc) is constructing the listfile, since make.shared reports it has more sequences than the original fasta file:
complete_set.unique.good.filter.unique.precluster.pick.fasta (209,950 seqs)
complete_set.unique.good.filter.unique.precluster.denovo.uchime.pick.count_table (209,950 seqs)
#this gives the following output:
You did not set a cutoff, using 0.03.
vsearch v1.11.1_linux_x86_64, 15.6GB RAM, 8 cores
Reading file complete_set.unique.good.filter.unique.precluster.pick.fasta.sorted.fasta.temp 100%
89705266 nt in 206950 seqs, min 397, max 453, avg 433 # <- vsearch reads the correct number of sequences
Clusters: 32062 Size min 1, max 1923, avg 6.5
Singletons: 21977, 10.6% of seqs, 68.5% of clusters
It took 512 seconds to cluster
Output File Names:
Then run make.shared
and this returns a long list of
[ERROR]: S23S244_6302 is in your listfile and not in your count file. Please correct.
Your group file contains 206950 sequences and list file contains 239012 sequences. Please correct.
And now the listfile is reportedly 239012 sequences. None of the files in the pipeline have this precise number, and I checked nothing else is being used with get.current().
I also checked the presence of some of the sequences in both files, and they are both there as far as I can tell.