I have used mothur successfully when I have been given the raw data off of the miseq and start with the command “make.contigs”. This time I had my samples sequenced at the Broad and was given a list of fasta files for each of my samples. I am not sure where to start in the miseq sop with the fasta files or which command to use. I did make a groups file that looks like this:
I was only going to try 4 samples out of the 200 or so that I received. I need help! Do the fasta files need to be merged into one? Can you give me the exact command to use?
Thanks for your help.
Is each fasta file a different sample?
That doesnt look like a valid group file, see here: http://www.mothur.org/wiki/Group_file
If each file is a different sample, then you probably want to do something like this:
Each fasta file is a different sample and I understand how to solve that from your explanation. Thank you. I’m still not clear on the groups file. Does the sequence name go first and then the sample name? Do I take out the .fasta part? Or are you telling me how to make the groups file in your example with the first command? Sorry…thank you!!!
Yeah, the first command will make the groups file. The format of a group file is
Repeated with one line for every sequence (not one line per file). Since you probably have many thousands of sequences, it isnt practical to make the file by hand. The make.group command will make one for you.
I ran both the make.group and merge.files as you suggested and both worked fine. However, now I have a new problem. Within the fasta files the sequences are numbered. Here is an example.
The next command I used was unique.seqs and because all the fasta files have numbers corresponding to the sequences I get an error saying I already have a sequence named 4 (for example)
How do I fix this problem and is unique.seqs the correct command to use next.
I would get Broad to give you the original fastq files. It sounds like they’ve created quite a mess for you.