Re-posting in case this went unnoticed.
I’m getting a discrepancy when counting the sequences after pre.cluster when using a count_table vs a names file.
What could be the cause?
Before running pre.cluster()
The count_table and names files have exactly the same # of sequences:
summary.seqs(fasta=XXX.fasta, name=XXX.names)
of unique seqs: 27096
total # of seqs: 348502
summary.seqs(fasta=XXX.fasta, count=XXX.count_table)
of unique seqs: 27096
total # of seqs: 348502
After running pre.cluster():
pre.cluster(fasta=XXX.fasta, count=XXX.count_table, diffs=3)
I use the new fasta and count_table files:
summary.seqs(fasta=XXX.precluster.fasta, count=XXX.precluster.count_table)
and get:
of unique seqs: 9165
total # of seqs: 348502
Or the new fasta file with the old names file (I have no new names file):
summary.seqs(fasta=XXX.precluster.fasta, name=XXX.names)
and get:
of unique seqs: 9165
total # of seqs: 281982
What’s happening? pre.cluster() isn’t suppose to reduce the number of sequences only the number of unique sequences.
Using mothur v.1.32.0
Thanks!