Names file vs count_table after pre.cluster()

roey.angel · November 19, 2013, 5:25pm

Re-posting in case this went unnoticed.

I’m getting a discrepancy when counting the sequences after pre.cluster when using a count_table vs a names file.
What could be the cause?

Before running pre.cluster()
The count_table and names files have exactly the same # of sequences:

summary.seqs(fasta=XXX.fasta, name=XXX.names)

of unique seqs: 27096

total # of seqs: 348502

summary.seqs(fasta=XXX.fasta, count=XXX.count_table)

of unique seqs: 27096

total # of seqs: 348502

After running pre.cluster():

pre.cluster(fasta=XXX.fasta, count=XXX.count_table, diffs=3)

I use the new fasta and count_table files:

summary.seqs(fasta=XXX.precluster.fasta, count=XXX.precluster.count_table)

and get:

of unique seqs: 9165

total # of seqs: 348502

Or the new fasta file with the old names file (I have no new names file):

summary.seqs(fasta=XXX.precluster.fasta, name=XXX.names)

and get:

of unique seqs: 9165

total # of seqs: 281982

What’s happening? pre.cluster() isn’t suppose to reduce the number of sequences only the number of unique sequences.

Using mothur v.1.32.0

Thanks!

westcott · November 19, 2013, 5:46pm

Could you send your fasta, name and count file to mothur.bugs@gmail.com?

westcott · November 20, 2013, 2:42pm

The num mismatch is not a bug. You are using the pre.clustered fasta file with a names file that was not pre clustered. In the count file, as each sequence was merged with the representative its counts were added to the unique sequences counts thus preserving the overall sequence totals. In the names file, no merging was done. In summary.seqs command, as mothur reads the names file the sequences that were merged by pre clustered command are ignored because they are not in the preclustered fasta file. To compare the files you would want to run:

summary.seqs(fasta=yourFasta, name=yourName)
summary.seqs(fasta=yourFasta, count=yourCount)
pre.cluster(fasta=yourFasta, count=yourCount, diffs=3)
pre.cluster(fasta=yourFasta, name=yourName, group=yourGroup, diffs=3)
summary.seqs(fasta=yourFasta, name=yourName)
summary.seqs(fasta=yourFasta, count=yourCount)

Topic		Replies	Views
Trouble keeping both updated names file and a count_table Commands in mothur	6	6247	January 19, 2015
Pre.cluster returned unexpectable files Commands in mothur	3	492	August 9, 2019
cluster can't find seq name in count.table Commands in mothur	1	2063	September 23, 2014
pre.cluster problem mothur bugs	3	5388	October 20, 2014
Help in pre.cluster Commands in mothur	3	326	August 18, 2023

Names file vs count_table after pre.cluster()

of unique seqs: 27096

of unique seqs: 27096

of unique seqs: 9165

of unique seqs: 9165

Related topics