remove duplicate entries from groups file

Kendra · January 7, 2013, 11:05pm

somehow I have some sequences duplicated in my groups file-I don’t know how that happened. the lines are completely identical

sequencexyz groupA
sequencexyz groupA

any ideas how to find and remove the duplicates? I tried list.seqs on the names then get seqs but it selects both lines since they match the sequence name. This is a huge dataset 7.5M reads post cleaning and trimming so don’t want to just rerun trim.seqs unless there’s no other way

pschloss · January 8, 2013, 1:34pm

If you’re using mac/unix you can do…

sort file.groups | uniq > newfile.groups

Kendra · January 8, 2013, 7:52pm

thanks

Topic		Replies	Views
missing.group Commands in mothur	5	41047	January 29, 2010
groupfile mothur bugs	1	2825	February 11, 2011
no equal numbers of sequences between name and group file mothur bugs	6	6814	May 5, 2012
groupfile has more valid sequences in it than my namefile mothur bugs	7	11320	October 24, 2012
Losing sequences from names file with remove.groups Commands in mothur	4	3315	May 1, 2012

remove duplicate entries from groups file

Related Topics