Cluster error (from names file)

I’m running into an error with the cluster command:
cluster(column=1.dist, cutoff=0.10, name=1.pick.names)
Reading matrix: ||AAError: Sequence ‘1-MS28F-388R__M02696_39_000000000-AH733_1_1118_5602_16641’ was not found in the names file, please correct

I can check the names file, and the sequence name definitely exists - it’s the first sequence to be after a comma in the 1.pick.names file. So it looks like the cluster command isn’t reading correctly after the comma, but based on the wiki, it shouldn’t have a problem. Any ideas about what is going on?

Thanks!

it’s the first sequence to be after a comma in the 1.pick.names file

I think that’s the problem. It should be the first sequence on the line and the first sequence after the tab. That line should look something like…

1-MS28F-388R__M02696_39_000000000-AH733_1_1118_5602_16641\t1-MS28F-388R__M02696_39_000000000-AH733_1_1118_5602_16641,somethingA,somethingB,somethingC

Yes, that makes sense. The names file came directly from the unique.seqs command, without alteration. So there is something going on with that command that creates a .names file that can’t be used. Is there any other way to make a names file, besides the unique.seqs command? Or is there a way to correc the names file that unique.seqs makes?

Thanks for your help!

I just noticed that your names file has a “pick” in the name. I suspect you removed sequences from the names file that weren’t also removed from the distance matrix. Can you make sure that you’re using the pick.fasta file as the input to dist.seqs?