I don’t know if this already exists but I was wondering if there is a way of appending the group of a sequence to the > identifier? i.e. take the groups file and the fasta and names and then do this, seq ID is >HIGFX12HGSR4 group is Soil1, so the output would be >Soil1_HIGFX12HGSR4 instead, I imagine there is a quick way of doing this via the command line in the unix environment but it eludes me.
If this is already a feature, could you let me know how to do it? If not could this be included in a future version of mothur? (Also could you let me know how to do it via the CLI?
imac:~ SarahsWork$ cat oldFile.fasta | sed ‘s:>(.) group is (.):>\2_\1:’ > newFile.fasta
This will read oldFIle.fasta:
HIGFX12HGSR4 group is Soil1
TAAGACGAACCGTGCGAACGTTGTTCGGAATCACTGGGCTTAAAGGGCGCGTAGGCGGGCCATCAAGTCCGGGGTGAAAT
ASD2YFHDOIDFJ group is SOil2
TAAGACGAACCGTGCGAACGTTGTTCGGAATCACTGGGCTTAAAGGGCGCGTAGGCGGGCCATCAAGTCCGGGGTGAAAT
I had the same question. What this command will do is modify the sequence headers of the fasta file, not take the *.group + *.fasta files to change the latter’s headers.
Thanks.