Eeverything seems to be working fine. The trouble is when I then try to run:
I get loads (thousands) of errors saying:
[ERROR]: You already have a sequence named G9SS7BA01CUVAI in your fasta file, sequence names must be unique, please correct.
but the process finishes without crashing.
Searching in the original RTL2011.08.shhh.fasta file for any of these ACC numbers i see that indeed most (maybe all) of my sequences have an identical duplicate somewhere in the file (always 514 lines later).
Now, I can of course remove them manualy with a short perl script (and do the same for the names file where I also have duplicate lines), but that really doesn’t sound like the best way to go about doing this.
The pheonomenon is reprducible with multiple runs I tried to see if the same happens with Scholss’ SOP data but the GQY1XT001.flow isn’t available.
The trim.flows command is adding the individual flow file names to the RTL2011.08.flow.files more than once. This occurs when you have primers or barcodes with the same name. In your case, the primers both had a blank name. We will correct this in our next release. In the meantime, after you run trim.flows, run “sort RTL2011.08.flow.files | uniq > RTL2011.08.flow.files” to remove the duplicate names before you run shhh.flows. Thanks for bringing this to our attention!
Sorry for not being more specific. The “sort RTL2011.08.flow.files | uniq > RTL2011.08.flow.files” is not a mothur command. You have to run it outside of mothur from a shell. It sorts the results in the flow file and then only outputs the unique lines. Basically, it removes the duplicates for you.
You are correct about the primer name. Mothur is looking for something like: