Clarity on Cluster .names inclusion

juliacope · October 16, 2012, 8:48pm

I am seeking clarity on why one would leave a .names file in the Cluster (or the Split.cluster) command. I have a small highly redundant dataset of sequences from isolates. When I unique.seqs, the names file takes a large proportion of my sequences and stores them away. Is inclusion of the names file in the clustering command the only way to link those sequences back into the dataset before using the list for binning? I am asking as I presume this will make computation time terrible later when I compare these isolates back to the pyrosequencing data for the same samples. Is there an easier way to deunique the list that I have missed?

my current script is below:

summary.seqs(fasta=sample.fas)
screen.seqs(fasta=sample.fas, maxambig=1)
unique.seqs(fasta=sample.good.fas)
dist.seqs(fasta=sample.good.unique.fas, output=square)
cluster(phylip=sample.good.unique.square.dist, name=sample.good.names)
bin.seqs(list=sample.good.unique.square.an.list, fasta=sample.good.unique.fas, name=sample.good.names, label=0.03)
get.sharedseqs(list=sample.good.unique.square.an.list, group=ox.groups, fasta=sample.good.fas)
parse.list(list=sample.good.unique.square.an.list, group=ox.groups, label=0.03)
make.shared(list=sample.good.unique.square.an.list, group=ox.groups, label=0.03)
venn(shared=sample.good.unique.square.an.shared, nseqs=T, label=0.03)

Thanks in advance,

Julia Cope

westcott · October 17, 2012, 11:01am

The names file is used by the cluster command to add the redundant names into the list file, but it is also needed to the clustering calculations. The average neighbor method uses the number of sequences in each OTU to weight the distance when merging OTUs.

juliacope · October 17, 2012, 4:39pm

Here’s the thing, it doesn’t appear to work out that way. I figured I missed another name= inclusion in the script I posted. When I run the script as posted, the uniques aren’t picked up as the separate sequences in the groups file and as members of their respective OTUs. I actually see no shared sequences between groups when the unique level (unique.seqs) grouped them together. The list file shows them, but the getshared.seqs shows nothing shared at unique and beyond. Could you shed some light?

-Julia

westcott · October 17, 2012, 5:30pm

When you run get.sharedseqs without using the unique or shared parameter to select groups, mothur assumes you want the OTUs unique to all groups. I suspect you don’t have any OTUs that contains all groups. http://www.mothur.org/wiki/Get.sharedseqs#unique_.26_shared

If you enter your groups under the unique parameter mothur will return the otus that contain ONLY sequences from those groups.
If you enter your groups under the shared parameter mothur will return the otus that contain sequences from those groups and may also contain sequences from other groups.

Topic		Replies	Views
problems with .names or .fasta files while using cluster Commands in mothur	4	3089	December 11, 2012
Cluster error (from names file) mothur bugs	3	2587	December 28, 2015
New groupfile needed after unique.seqs? Commands in mothur	5	3985	April 8, 2013
pre.cluster in batch mode Commands in mothur	2	3472	September 11, 2013
unique.seqs & abundance Commands in mothur	11	9541	May 3, 2013

Clarity on Cluster .names inclusion

Related topics