How to make a "group" file

I had to change the names of sequences after the trim.seqs command…and I need to make a new .group file…Is there an automatic way to do it? or do I have to do it manually :roll:


There’s no automatic way to do it unfortunately. Why did you have to change the names? You’ll also have to chagne the names in the names file as well. You might check the old Sogin analysis example analysis page to see how we made group files in the bad old days before barcodes.

Hi Pat,
Yes I did the group file manually…After I trimmed the sequences I change the names (because they are to long -I am working with a protein coding gene- so then I have to translate my sequences and using different programs …if the names are too long it doesn’t work!) and made a .group file through excel…I then do a unique.seqs for the fast file which which gives me new .name file

My next question is, for the screen.seqs command I also use the file …and it works even though I do not have the same sequences in that file as in the trim.newnames.unique.align file …Still when I run a count.groups for the file I have a different number of sequences then in the trim.newnames.unique.good.align …and a different number of sequences then in the original file …How can we explain this? should I worry? Or should I just make a new .group file each time to get the right number of seqeunces in each group as I go…
Do I make sense?.. :roll:


Hi Kim,

The trim.newnames.unique.align file will contain only the unique sequences, but the group file will have all the sequences so it is fine if these 2 files have a different number of sequences in them. The difference in the number of sequences between the and is explained because the screen.seqs command removed sequences based on the criteria you used with the command. This is also nothing to worry about. When running your analysis in mothur you want to be sure to include the names file. The names file helps you relate the fasta and group files. For example:



seq1 seq1,seq3,seq5,seq6
seq2 seq2,seq7,seq8
seq4 seq4

seq1 group1
seq2 group1
seq3 group1
seq4 group1
seq5 group2
seq6 group2
seq7 group3
seq8 group3

Your fasta file contains 3 sequences, but your group file contains 8. Looking at the names file we can see that seq1 represents 4 sequences, seq2 represents 3 and seq4 represents 1, totaling 8. Group1 contains 4 sequences, group2 contains 2 and group3 contains 2, totaling 8. seq1 represents sequences from groups 1 and 2, seq2 represents sequences from groups 1 and 3, and seq4 represents sequences from group1. Does this clear things up?


Thanks Sarah! :wink: