Trouble keeping both updated names file and a count_table

roey.angel · November 11, 2013, 5:49pm

The MiSeq SOP tends to work primarily with count_table (which I guess replaces the old .groups file), rather than names file, so as to keep information about different samples.

Many functions however (e.g. unique.seqs, split.abund, pre.cluster etc.) only accept either a names file or a count_table. The result is that after a few steps of sequence gleaning with count_table the names file is no longer up to date and trying to use it will give an error. The trouble is that there’s no way to convert the two (names into count_table and vice versa).
For most analysis just having the count_table file is fine but for working with some external tools I need work with a de-uniqued fasta file. However, the function deunique.seqs only accepts a names file…

Any solution to the problem?

I tried running functions in parallel (once with a names file and once with a count_table) but ran into discrepancies after a while.

westcott · November 12, 2013, 2:46pm

Thanks for the feature request! We will add the count table to the deunique.seqs command for the next release of mothur.

roey.angel · November 12, 2013, 5:08pm

Thanks a lot for considering my needs.
I was wondering if there’s a workaround in the meantime.
I’ve noticed that things start to diverge from the point of running pre.cluster(). The function behaves differently when given a name file vs a count_table. If I run it with count_table:

pre.cluster(fasta=xxx, count=xxx, diffs=3)

running Then summary.seqs() gives different results when used with name file vs when using the new count_table:

summary.seqs(fasta=XXX.precluster.fasta, count=XXX.precluster.count_table)

gives:

of unique seqs: 9165

total # of seqs: 348502

but:

summary.seqs(fasta=XXX.precluster.fasta, name=XXX.names)

gives:

of unique seqs: 9165

total # of seqs: 281982

Why is that? and how can I work around it?

Thanks again for all the support.

westcott · November 12, 2013, 5:42pm

Hmmm… That looks odd. Are you sure your names and count files have the same number of lines? Can you try running the count.groups command of the count file and the names and groups file?

roey.angel · November 14, 2013, 2:42pm

Before running pre.cluster()
The count_table and names files have exactly the same # of sequences:

summary.seqs(fasta=XXX.fasta, name=XXX.names)

of unique seqs: 27096

total # of seqs: 348502

summary.seqs(fasta=XXX.fasta, count=XXX.count_table)

of unique seqs: 27096

total # of seqs: 348502

After running pre.cluster():

pre.cluster(fasta=XXX.fasta, count=XXX.count_table, diffs=3)

I use the new fasta and count_table files:

summary.seqs(fasta=XXX.precluster.fasta, count=XXX.precluster.count_table)

and get:

of unique seqs: 9165

total # of seqs: 348502

Or the new fasta file with the old names file (I have no new names file):

summary.seqs(fasta=XXX.precluster.fasta, name=XXX.names)

and get:

of unique seqs: 9165

total # of seqs: 281982

What’s happening? pre.cluster() isn’t suppose to reduce the number of sequences only the number of unique sequences.

Using mothur v.1.32.0

Thanks!

chassenr · January 16, 2015, 4:41pm

Hi,
I have a similar problem with the pre.cluster command (mothur v.1.33).
I ran pre.cluster 3 times with the same input fasta file and corresponding 1) count_table, 2) name file and 3) name and group file.
Each time the total number of sequences stayed the same (as expected), but the number of unique sequences varied dramatically (several thousands) between the three runs.
Shouldn’t the number of sequences that get ‘merged’ into unique sequences stay the same for the same input, regardless of which group (i.e. sample) the sequences are from?
If the group information does matter during pre.cluster, why is there still a difference in the output between option 1 and 3?
Does it matter that I split the job to several processors for 3?

pre.cluster(fasta=All.unique.good.filter.unique.fasta, count=All.unique.good.filter.count_table, diffs=3)
unique 120858
total 576401

pre.cluster(fasta=All.unique.good.filter.unique.fasta, name=All.unique.good.filter.names, diffs=3)
unique 112858
total 5764ÃŸ1

pre.cluster(fasta=All.unique.good.filter.unique.fasta, name=All.unique.good.filter.names, diffs=3, group=All.good.groups, processors=8)
unique 120618
total 576401

Thanks!

westcott · January 19, 2015, 6:21pm

When running pre.cluster without groups the names and count files should match. Can you try running the following?

mothur > pre.cluster(fasta=GQY1XT001.shhh.trim.unique.good.filter.unique.fasta, name=GQY1XT001.shhh.trim.unique.good.filter.names, diffs=2)
mothur > summary.seqs(name=current)

of unique seqs: 5271

total # of seqs: 67746

mothur > make.table(name=GQY1XT001.shhh.trim.unique.good.filter.names)
mothur > pre.cluster(fasta=GQY1XT001.shhh.trim.unique.good.filter.unique.fasta, count=current, diffs=2)
mothur > summary.seqs(count=current)

of unique seqs: 5271

total # of seqs: 67746

When running pre.cluster with groups there may be some slight variation. This is caused by the ties in abundance. The pre.cluster command clusters within the groups. It sorts the sequences by abundance and in the case of ties by sequence name. When using the names file, mothur uses the names of the redundant sequences from the names file. The count file only has unique sequence names, so the order of ties in abundance can be slightly different.

mothur > pre.cluster(fasta=GQY1XT001.shhh.trim.unique.good.filter.unique.fasta, name=GQY1XT001.shhh.trim.unique.good.filter.names, group=GQY1XT001.shhh.good.groups, diffs=2)
mothur > summary.seqs(name=current)

of unique seqs: 6804

total # of seqs: 67746

mothur > make.table(name=GQY1XT001.shhh.trim.unique.good.filter.names, group=GQY1XT001.shhh.good.groups)
mothur > pre.cluster(fasta=GQY1XT001.shhh.trim.unique.good.filter.unique.fasta, count=current, diffs=2)
mothur > summary.seqs(count=current)

of unique seqs: 6782

total # of seqs: 67746

Topic		Replies	Views
Names file vs count_table after pre.cluster() Commands in mothur	2	2779	November 20, 2013
Running deunique.seqs with a count table? Commands in mothur	1	2913	October 10, 2013
summary.seqs error - count table not unique Commands in mothur	2	1457	January 12, 2016
Cannot update count file due to file mismatch even after using screen.seq(fasta=.fasta,count=.count_table) Commands in mothur	4	3020	January 8, 2018
merge 2 count tables Commands in mothur	4	1852	August 9, 2016

Trouble keeping both updated names file and a count_table

of unique seqs: 9165

of unique seqs: 9165

of unique seqs: 27096

of unique seqs: 27096

of unique seqs: 9165

of unique seqs: 9165

of unique seqs: 5271

of unique seqs: 5271

of unique seqs: 6804

of unique seqs: 6782

Related Topics