Hi,
I’m trying to run the following commands:
trim.flows(flow=RTL2011.08.flow, oligos=RTL2011.08.arc.oligos, bdiffs=2, pdiffs=3, minflows=350, maxflows=750, processors=2)
shhh.flows(file=RTL2011.08.flow.files, processors=2)
trim.seqs(fasta=RTL2011.08.shhh.fasta, name=RTL2011.08.shhh.names, oligos=RTL2011.08.arc.oligos, pdiffs=3, bdiffs=2, maxhomop=8, minlength=200, flip=T, processors=2)
Eeverything seems to be working fine. The trouble is when I then try to run:
unique.seqs(fasta=RTL2011.08.shhh.trim.fasta, name=RTL2011.08.shhh.trim.names)
I get loads (thousands) of errors saying:
[ERROR]: You already have a sequence named G9SS7BA01CUVAI in your fasta file, sequence names must be unique, please correct.
but the process finishes without crashing.
Searching in the original RTL2011.08.shhh.fasta file for any of these ACC numbers i see that indeed most (maybe all) of my sequences have an identical duplicate somewhere in the file (always 514 lines later).
Now, I can of course remove them manualy with a short perl script (and do the same for the names file where I also have duplicate lines), but that really doesn’t sound like the best way to go about doing this.
The pheonomenon is reprducible with multiple runs I tried to see if the same happens with Scholss’ SOP data but the GQY1XT001.flow isn’t available.
Any ideas as why this is happening?
Thanks in advance,
Roey