Merge fasta and group files

Hi

I am trying to merge a fasta and group files that look like this:

  1. fasta file:

HCZRAFT02HJAN7
A-G-A-T----AAC------AG-C-C-C-AC-C-A-----A-GG-C-G–A-T–G-A—T–AC-A–T--A–G—C-T-G-G-TCT–G-AG—A–G-G-AT–G-AA-C-A-G-CCA-C-…
HCZRAFT02HUO5K
A-G-G-T----AAC------GG-C-T-C-AC-C-A-C…

  1. and a group file specifying which sequence is found in which sample:

HCZRAFT02JI33I C_129
HCZRAFT02IMY3H C_129


I'd like to end up with a file that looks like this:

C_129_HCZRAFT02JI33I
A-G-A-T----AAC------AG-C-C-C-AC-C-A-----A-GG-C-G–A-T–G-A—T–AC-A–T--A–G—C-T-G-G-TCT–G-AG—A–G-G-AT–G-AA-C-A-G-CCA-C-…
C_129_HCZRAFT02IMY3H
-G-G-T----AAC------GG-C-T-C-AC-C-A-C…

Could the merge.files command in mothur would work for this?

Any help would be greatly appreciated

Andres

Yep, have you tried running merge.files yet?

Yes, It basically gave me All the sequences first and then all the info in the group files, so not quite what we wanted, I want my sequence file to have my sample name first, like:

C_129_HCZRAFT02HXC4S

from these 2 files:

Group
HCZRAFT02HXC4S C_129

Seq

HCZRAFT02J3HFZ
GGAACTGAGACGACCGGTCCAGACCTCCGTACGGGGAGGCAGCAGGTGGGGAATCTTC

Sorry, I don’t understand what you’re trying to do. Can you also provide the actual command you called?

Thanks Pat.

I have come up with a perl script that took me a long time to write and that worked well for what I wanted, but it’d be great mothur would allow to do this:

I have a fasta file and a group file, my fasta file does not have any information on sample affiliation, just a code that looks like this:

HCZRAFT02IZ45I
AAGGCAACGATCAGTAGGGGTTCTGAGAGGAAGGTCCCCCACATTGGAACTGAGACACGGTCCAAACTCCTACGGGAGGCAGCAGTGAGGAATATTGGTCAATGGGCGAGAGCCTGAACCAGCCAAGTAGCGTGCAGGACGACGGCCCTATGGGTTGTAAACTGCTTTTATAGGGGAATAAAGTGAGCCACGTGTGGCTTTTTGCATGTACCCTATGAATAAGGACCGGCTAATTCCG
HCZRAFT02JHPKN
ATTACCGCGCTGCTGGCACGTAGTAAGCCGATGCTTCCTCAGTAGGTACCGTCCATTCTCGTCCCCACCTGACAAAGGTTTAACAATCCGAAGACC…


my group file has info o which sequence belongs to which sample. HCZRAFT02IZ45I C_129 HCZRAFT02JHPKN C_128
I want to merge the 2 files in one so that my fasta file incorporates the sample id before sequence ID, like this:

C_128_HCZRAFT02IZ45I
AAGGCAACGATCAGTAGGGGTTCTGAGAGGAAGGTCCCCCACATTGGAACTGAGACACGGTCCAAACTCCTACGGGAGGCAGCAGTGAGGAATATTGGTCAATGGGCGAGAGCCTGAACCAGCCAAGTAGCGTGCAGGACGACGGCCCTATGGGTTGTAAACTGCTTTTATAGGGGAATAAAGTGAGCCACGTGTGGCTTTTTGCATGTACCCTATGAATAAGGACCGGCTAATTCCG
C_129_HCZRAFT02JHPKN
ATTACCGCGCTGCTGGCACGTAGTAAGCCGATGCTTCCTCAGTAGGTACCGTCCATTCTCGTCCCCACCTGACAAAGGTTTAACAATCCGAAGACC…

The whole idea is to be able to use a curated fasta file to get qiime to give me a .biome OTU table to use in PICRUSt later. It’d be great if mothur had that option. I always use mothur to curate my sequences, because I believe I get way better quality data but qiime and galaxy have other very interesting analysis tools.

Thanks

Andres

So I used :
merge.files (input=All_Howler.fasta-All_Howler.groups, output=Merged.fasta)

We had a similar request, Manipulate Sequence Identifiers The rename.seqs command will be part of 1.32.0. The command appends the group name to the original sequence name.

HCZRAFT02JI33I C_129 would become HCZRAFT02JI33I_C_129.