I am seeking clarity on why one would leave a .names file in the Cluster (or the Split.cluster) command. I have a small highly redundant dataset of sequences from isolates. When I unique.seqs, the names file takes a large proportion of my sequences and stores them away. Is inclusion of the names file in the clustering command the only way to link those sequences back into the dataset before using the list for binning? I am asking as I presume this will make computation time terrible later when I compare these isolates back to the pyrosequencing data for the same samples. Is there an easier way to deunique the list that I have missed?
my current script is below:
summary.seqs(fasta=sample.fas)
screen.seqs(fasta=sample.fas, maxambig=1)
unique.seqs(fasta=sample.good.fas)
dist.seqs(fasta=sample.good.unique.fas, output=square)
cluster(phylip=sample.good.unique.square.dist, name=sample.good.names)
bin.seqs(list=sample.good.unique.square.an.list, fasta=sample.good.unique.fas, name=sample.good.names, label=0.03)
get.sharedseqs(list=sample.good.unique.square.an.list, group=ox.groups, fasta=sample.good.fas)
parse.list(list=sample.good.unique.square.an.list, group=ox.groups, label=0.03)
make.shared(list=sample.good.unique.square.an.list, group=ox.groups, label=0.03)
venn(shared=sample.good.unique.square.an.shared, nseqs=T, label=0.03)
Thanks in advance,
Julia Cope