remove all identical sequences from *.names file

mbakker · February 21, 2010, 1:23pm

Is there any way to remove entire rows from a *.names file? Remove.seqs only takes the specified accession numbers, leaving other identical sequences in place. I ran the chimera check on my *.unique.fasta file, and found > 2000 chimeric sequences so I want to remove those plus anything that is identical to one of those from my names file… and I don’t want to do this manually!

I started with 28746 unique sequences, and a names file with an equal number of lines.

All.pick.good.filter.filter.unique.fasta - 28746 seqs

All.pick.good.filter.filter.names - 28746 lines; 254640 accnos

The chimera check discovers 2273 chimeric sequences among those 28746

CHIMERA CHECK

chimera.seqs(fasta=All.pick.good.filter.filter.unique.fasta, template=silva.filter.filter.fasta, method=pintail, processors=2)

See silva_chimeras.xls; 2273 chimeric seqs

Remove putative chimeric sequences

remove.seqs(accnos=silva_chimeras, fasta=All.pick.good.filter.filter.unique.fasta)
remove.seqs(accnos=silva_chimeras, name=All.pick.good.filter.filter.names)
remove.seqs(accnos=silva_chimeras, group=sample.pick.good.group)

All.pick.good.filter.filter.unique.pick.fasta - 26473 seqs

All.pick.good.filter.filter.pick.names - 27276 lines; 252367 accnos

sample.pick.good.pick.group - 252367 seqs

After running remove.seqs, you can see that I have more lines in my *.pick.names file than I have sequences in my *.unique.pick.fasta; this is because remove.seqs only takes away the first accession number in the row, and leaves the rest (EVEN THOUGH ALL ACCESSION NUMBERS IN A ROW HAVE IDENTICAL SEQUENCES) - so in this case I haven’t actually removed all of the chimeric sequences from the dataset.

westcott · February 22, 2010, 1:02pm

mothur currently doesn’t have a way to do what you want, but thanks to your suggestion we are modifying remove.seqs in 1.9.

Topic		Replies	Views
remove.seqs with names file mothur bugs	1	11376	February 19, 2010
0 sequences removed after remove.seqs mothur bugs	2	882	August 30, 2018
Produce a chimera clean file? Feature requests	2	4387	April 21, 2010
remove.seqs didn't remove any from count Commands in mothur	2	1753	June 10, 2016
Mismatch error when removing chimeras Commands in mothur	4	750	November 23, 2019