Clarity on Cluster .names inclusion

I am seeking clarity on why one would leave a .names file in the Cluster (or the Split.cluster) command. I have a small highly redundant dataset of sequences from isolates. When I unique.seqs, the names file takes a large proportion of my sequences and stores them away. Is inclusion of the names file in the clustering command the only way to link those sequences back into the dataset before using the list for binning? I am asking as I presume this will make computation time terrible later when I compare these isolates back to the pyrosequencing data for the same samples. Is there an easier way to deunique the list that I have missed?

my current script is below:

screen.seqs(fasta=sample.fas, maxambig=1)
dist.seqs(fasta=sample.good.unique.fas, output=square)
cluster(phylip=sample.good.unique.square.dist, name=sample.good.names)
bin.seqs(, fasta=sample.good.unique.fas, name=sample.good.names, label=0.03)
get.sharedseqs(, group=ox.groups, fasta=sample.good.fas)
parse.list(, group=ox.groups, label=0.03)
make.shared(, group=ox.groups, label=0.03)
venn(, nseqs=T, label=0.03)

Thanks in advance,

Julia Cope

The names file is used by the cluster command to add the redundant names into the list file, but it is also needed to the clustering calculations. The average neighbor method uses the number of sequences in each OTU to weight the distance when merging OTUs.

Here’s the thing, it doesn’t appear to work out that way. I figured I missed another name= inclusion in the script I posted. When I run the script as posted, the uniques aren’t picked up as the separate sequences in the groups file and as members of their respective OTUs. I actually see no shared sequences between groups when the unique level (unique.seqs) grouped them together. The list file shows them, but the getshared.seqs shows nothing shared at unique and beyond. Could you shed some light?


When you run get.sharedseqs without using the unique or shared parameter to select groups, mothur assumes you want the OTUs unique to all groups. I suspect you don’t have any OTUs that contains all groups.

If you enter your groups under the unique parameter mothur will return the otus that contain ONLY sequences from those groups.
If you enter your groups under the shared parameter mothur will return the otus that contain sequences from those groups and may also contain sequences from other groups.