mysterious seqID from cluster.split


cluster.split gave an error message after splitting the distance file:
“Error: Sequence ‘05’ was not found in the names file, please correct”

here are the commands I’ve executed:

align.seqs(fasta=seqAll.unique.fasta, reference=silva.bacteria.fasta, flip=t)
filter.seqs(fasta=seqAll.unique.align, vertical=T)
chimera.uchime(fasta=seqAll.unique.filter.fasta, name=seqAll.names, processors=2)
remove.seqs(fasta=seqAll.unique.filter.fasta, group=groupAll.groups, accnos=seqAll.unique.filter.uchime.accnos, name=seqAll.names)
dist.seqs(fasta=seqAll.unique.filter.pick.fasta, cutoff=0.1, processors=3)
cluster.split(column=seqAll.unique.filter.pick.dist, name=seqAll.pick.names, large=T, method=average, cutoff=0.05)

output I saw when running cluster.split:

Using 1 processors.
Using splitmethod distance.
Splitting the file...
It took 62636 seconds to split the distance file.

Reading seqAll.unique.filter.pick.dist.2.temp
Reading matrix:     ||||||AAError: Sequence '05' was not found in the names file, please correct

I checked, and there is no sequence ‘05’ in the distance file. All my sequence ID’s start with “gnl”
Just emailed seqAll.unique.filter.pick.dist.2.temp and seqAll.pick.names.2.temp to mothur.bugs

What could be the problem?

I suspect you have a stray space or tab character somewhere in your files. Did you open and modify any of these text files?

Nope, I didn’t touch any of the generated files.

It appears the writing out of the split distance files had an error. I am not sure what caused it, but I can see the line in your distance file

gnl|SRA|SRR050533.2965.4 gnl|SRA|SRR050467.6375.4 0.02417
gnl|SRA|SRR050533.2965.4 gnl|SRA|SRR050478.5215.4 0.05405
05 |SRA|SRR050511.1115.4 0.05405

Can you try cluster.split(large=f, …)? Or try splitting by taxonomy? I may be able to track down the splitting issue if you send me the full distance and names files. I can’t reproduce it with our test data.

Thanks, Sarah.
I have sent the files to you.

Thanks for sending your files. I have them running now, and will let you know what I find.

I was not able to reproduce the issue. Were you able to run the cluster.split command by taxonomy or with large=f?