Multiple fasta files

Hello,

I have used mothur successfully when I have been given the raw data off of the miseq and start with the command “make.contigs”. This time I had my samples sequenced at the Broad and was given a list of fasta files for each of my samples. I am not sure where to start in the miseq sop with the fasta files or which command to use. I did make a groups file that looks like this:

WTRSD1_1 A7RLJ.1.Solexa-239840.splice.fasta
WTRCD1_2 A7RLJ.1.Solexa-239841.splice.fasta
WTRSD1_4 A7RLJ.1.Solexa-239843.splice.fasta
WTRCD1_6 A7RLJ.1.Solexa-239845.splice.fasta

I was only going to try 4 samples out of the 200 or so that I received. I need help! Do the fasta files need to be merged into one? Can you give me the exact command to use?

Thanks for your help.

Marnie

Is each fasta file a different sample?

That doesnt look like a valid group file, see here: http://www.mothur.org/wiki/Group_file

If each file is a different sample, then you probably want to do something like this:

make.group(fasta=7RLJ.1.Solexa-239840.splice.fasta-A7RLJ.1.Solexa-239841.splice.fasta-A7RLJ.1.Solexa-239843.splice.fasta-A7RLJ.1.Solexa-239845.splice.fasta, groups=WTRSD1_1-WTRCD1_2-WTRSD1_4-WTRCD1_6)

merge.files(input=7RLJ.1.Solexa-239840.splice.fasta-A7RLJ.1.Solexa-239841.splice.fasta-A7RLJ.1.Solexa-239843.splice.fasta-A7RLJ.1.Solexa-239845.splice.fasta, ouput=allsamples.fasta)

Each fasta file is a different sample and I understand how to solve that from your explanation. Thank you. I’m still not clear on the groups file. Does the sequence name go first and then the sample name? Do I take out the .fasta part? Or are you telling me how to make the groups file in your example with the first command? Sorry…thank you!!!


A7RLJ.1.Solexa-239840.splice.fasta WTRSD1_1 A7RLJ.1.Solexa-239841.splice.fasta WTRCD1_2 A7RLJ.1.Solexa-239843.splice.fasta WTRSD1_4 A7RLJ.1.Solexa-239845.splice.fasta WTRCD1_6

Yeah, the first command will make the groups file. The format of a group file is

Sequence_name group_name

Repeated with one line for every sequence (not one line per file). Since you probably have many thousands of sequences, it isnt practical to make the file by hand. The make.group command will make one for you.

New problem…

I ran both the make.group and merge.files as you suggested and both worked fine. However, now I have a new problem. Within the fasta files the sequences are numbered. Here is an example.

0
TACGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGCAGGCGGATTTTTAAGTCAGCGGTCAAATCGT
GGGGCTCAACCCCATCCAGCCGTTGAAACTGGGGATCTAGAGTGTGCGAGAGGTATGCGGAATGCGTGGTGTAGCGGTGA
AATGCATAGATATCACGCAGAACCCCGATTGCGAAGGCAGCATACCGGTGCACAACTGACGCTCAGGCACGAAAGCGTGG
GTAGCGAACAGG
1
TACGGAGGATGCAAGCGTTATCCGGATTTACTGGGTTTAAAGGGTGCGTAGGCGGAAGAGCAAGTCAGAGGTGAAATTCT
GTGGCTCAACCTCTGCACTGCATTGGAAACTGTTTTACTGGAGTGAGGGAGAGGTATGTGGAATGCGTGGTGTAGCGGTG
AAATGCGTAGATATCAGGAGGAACACCGATGGCGAAGGCGGCTTACTGGAGTGTAACTGACGCTGAGGCACGAAAGCGTG
GGGAGCAAACAGG
2
TACGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGCAGGCGGAAGATCAAGTCAGCGGTAAAATTGA
GAGGCTCAACCTCTTCGATCCGTTGAAACTGGTTTTCTTGAGTGAGCGAGAAGTATGCGGAATGCGTGGTGTAGCGGTGA
AATGCATAGATATCACGCAGAACTCCGATTGCGAGGGCAGCATACCGGCGCTCAACTGACGCTCATGCACGAAAGTGTGG
GTATCGAACAGG
3
TACGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGCCTGCCAAGTCAGCGGTAAAATTGC
GGGGCTCAACCCCGTACAGCCGTTGAAACTGCCGGGCTCGAGTGGGCGAGAAGTATGCGGAATGCGTGGTGTAGCGGTGA
AATGCATAGATATCACGCAGAACCCCGATTGCGAAGGCAGCATACCGGCGCCCGACTGACGCTGAGGCACGAAAGTGCGG
GGATCAAACAGG
4
TACGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTCTGTTAAGTCAGCGGTAAAATTGC
GGGGCTCAACCCCGTCGAGCCGTTGAAACTGGCAGACTTGAGTTGGCGAGAAGTACGCGGAATGCGCGGTGTAGCGGTGA
AATGCATAGATATCGCGCAGAACTCCGATTGCGAAGGCAGCGTACCGGCGCCAGACTGACGCTGAGGCACGAAAGCGTGG
GGAGCGAACAGG

The next command I used was unique.seqs and because all the fasta files have numbers corresponding to the sequences I get an error saying I already have a sequence named 4 (for example)

How do I fix this problem and is unique.seqs the correct command to use next.

Thank you,
Marnie

I would get Broad to give you the original fastq files. It sounds like they’ve created quite a mess for you.