remove all identical sequences from *.names file

Is there any way to remove entire rows from a *.names file? Remove.seqs only takes the specified accession numbers, leaving other identical sequences in place. I ran the chimera check on my *.unique.fasta file, and found > 2000 chimeric sequences so I want to remove those plus anything that is identical to one of those from my names file… and I don’t want to do this manually!

I started with 28746 unique sequences, and a names file with an equal number of lines.

All.pick.good.filter.filter.unique.fasta - 28746 seqs

All.pick.good.filter.filter.names - 28746 lines; 254640 accnos

The chimera check discovers 2273 chimeric sequences among those 28746

CHIMERA CHECK

chimera.seqs(fasta=All.pick.good.filter.filter.unique.fasta, template=silva.filter.filter.fasta, method=pintail, processors=2)

See silva_chimeras.xls; 2273 chimeric seqs

Remove putative chimeric sequences

remove.seqs(accnos=silva_chimeras, fasta=All.pick.good.filter.filter.unique.fasta)
remove.seqs(accnos=silva_chimeras, name=All.pick.good.filter.filter.names)
remove.seqs(accnos=silva_chimeras, group=sample.pick.good.group)

All.pick.good.filter.filter.unique.pick.fasta - 26473 seqs

All.pick.good.filter.filter.pick.names - 27276 lines; 252367 accnos

sample.pick.good.pick.group - 252367 seqs

After running remove.seqs, you can see that I have more lines in my *.pick.names file than I have sequences in my *.unique.pick.fasta; this is because remove.seqs only takes away the first accession number in the row, and leaves the rest (EVEN THOUGH ALL ACCESSION NUMBERS IN A ROW HAVE IDENTICAL SEQUENCES) - so in this case I haven’t actually removed all of the chimeric sequences from the dataset.

mothur currently doesn’t have a way to do what you want, but thanks to your suggestion we are modifying remove.seqs in 1.9.