Modifying sequence IDs with Group Name

Hi Pat and Sarah,

I don’t know if this already exists but I was wondering if there is a way of appending the group of a sequence to the > identifier? i.e. take the groups file and the fasta and names and then do this, seq ID is >HIGFX12HGSR4 group is Soil1, so the output would be >Soil1_HIGFX12HGSR4 instead, I imagine there is a quick way of doing this via the command line in the unix environment but it eludes me.

If this is already a feature, could you let me know how to do it? If not could this be included in a future version of mothur? (Also could you let me know how to do it via the CLI?

Cheers,
Tris

imac:~ SarahsWork$ cat oldFile.fasta | sed ‘s:>(.) group is (.):>\2_\1:’ > newFile.fasta

This will read oldFIle.fasta:

HIGFX12HGSR4 group is Soil1
TAAGACGAACCGTGCGAACGTTGTTCGGAATCACTGGGCTTAAAGGGCGCGTAGGCGGGCCATCAAGTCCGGGGTGAAAT
ASD2YFHDOIDFJ group is SOil2
TAAGACGAACCGTGCGAACGTTGTTCGGAATCACTGGGCTTAAAGGGCGCGTAGGCGGGCCATCAAGTCCGGGGTGAAAT

and create newFile.fasta:

Soil1_HIGFX12HGSR4
TAAGACGAACCGTGCGAACGTTGTTCGGAATCACTGGGCTTAAAGGGCGCGTAGGCGGGCCATCAAGTCCGGGGTGAAAT
SOil2_ASD2YFHDOIDFJ
TAAGACGAACCGTGCGAACGTTGTTCGGAATCACTGGGCTTAAAGGGCGCGTAGGCGGGCCATCAAGTCCGGGGTGAAAT

I had the same question. What this command will do is modify the sequence headers of the fasta file, not take the *.group + *.fasta files to change the latter’s headers.
Thanks.

Correct. To run it with the fasta and group files would require a new command to mothur or a custom script.