data missing?

Hello
I may be missing something but the http://www.mothur.org/w/images/a/a1/SOPData.zip at http://www.mothur.org/wiki/Schloss_SOP seems to be missing the main fasta file and qual as well as several others to reproduce the listed results. Data of the series GQY1XT001.qual GQY1XT001.fasta and GQY1XT001.qual GQY1XT001.fasta and likewise GQY1XT001.shhh.trim.fasta are missing from the

unable to complete:
trim.seqs(fasta=GQY1XT001.fasta, oligos=GQY1XT001.oligos, qfile=GQY1XT001.qual, maxambig=0, maxhomop=8, flip=T, bdiffs=1, pdiffs=2, qwindowaverage=35, qwindowsize=50, processors=2)
unique.seqs(fasta=GQY1XT001.shhh.trim.fasta, name=GQY1XT001.shhh.trim.names)
ect

Hi Casey,

You’re correct - we’ve provided the data needed to start with the flow files not the fasta/qual files.

Pat

I am having trouble with my data files as barcodes and primers are not being found properly by any program.
The sequencing center has put NNNN in front of the barcode. In addition, I cannot really tell if they sequenced
with the forward or the reverse primer. I have tried using both the 99.99% of sequences always end up in scrap.

It would be very helpful to have either the original GQY1XT001.sff or at least a fasta file for your SOP example
in order to see where the barcodes and primer sequences are expected to be for input to trim.seqs/flows.

Thanks

I’ve just posted the original SFF file as a separate download for those that might be interested. It’s about 800 MB and is a bz2 compressed file…

I suspect you’re getting your sequences from a sequencing center in Lubbock (this is their SOP; very annoying)? They have to tell you what the barcodes, primers, and direction are since everyone does it differently. If you pay for sequencing data, they should tell you how it was done.

Pat

Thank you for the sff file. I do not know Lubbock (except for a city in Texas) but you are correct I need to discuss what these people have done in the sequencing center as it seems very non-standard.

Sadly, the mothur forum categories are:
Theory, Commands, bugs, and Feature requests.
Might we get a general Questions category? I hate to post in bugs when this is not mothur’s fault whatsoever :oops:

Anyway, in the meantime, I will continue to hijack this thread. :wink:

The sequencing center replied that they sequence in both directions. Thus I have a single sff file with 20 different barcodes and 2 different primers. Sometimes the forward primer is at the 5’end of the reads (in the 5’-3’ direction) and sometimes the reverse primer is at the 5’end (also in the 5’-3’ direction). I checked also to see if I could find either primer in the first 50 bases looking for the reverse and reverse complement of the primer sequences and while the reverse sequence was never found (whew), I did find the reverse complement of each primer in the beginning of the reads 70 times each. (?? - out of 1.5 million reads, I guess these reads are just destined for the scrap bin anyway).

So, in order to follow the Schloss SOP, I am guessing that I need to make 2 oligo files, one with forward and one with reverse, and run the trim.flows with each, run the shhh.flows 2x trims.seqs 2x and then reverse.seqs() on the fasta that came from the oligo file with the reverse primer and then merge them prior to unique.seqs().

Does that sound correct? I am alone in having this type of 454 (FLX Titanium) sequencing performed in both directions?

So, in order to follow the Schloss SOP, I am guessing that I need to make 2 oligo files, one with forward and one with reverse, and run the trim.flows with each, run the shhh.flows 2x trims.seqs 2x and then reverse.seqs() on the fasta that came from the oligo file with the reverse primer and then merge them prior to unique.seqs().

Yeah, you can do it that way. Alternatively, you can give your forward primers a name like you do for the barcodes and only run trim.flows once. You will have to merge your output files manually, however.

I am alone in having this type of 454 (FLX Titanium) sequencing performed in both directions?

I’m not sure that alone is the right term, but there aren’t a lot of people on your island :). The problem with this strategy is that since your samples don’t sound like they are paired, you won’t be able to make contigs from them. People generally are trying to get more sequence reads and are willing to take a hit on read length. Of course we all want longer reads, but people really want a lot of reads. Your center’s strategy basically cuts the number of reads in half. If you look at the papers doing 454, I don’t think you’ll find many (any?) doing it this way. That doesn’t mean you can’t, it’s just different.

Hope this helps,
Pat

Finally met with the sequencing center and I asked about this and they basically told me that they had problems sequencing in a single direction and sequencing in both directions worked. In addition, they added that the current Roche kits do not do sequencing in a single direction. :?: This sounds totally suspect to me. I am trying to get them to use the HMP DACC 16S protocol where 27F/534R primers are used, hamming/golay barcodes are added to the R primer only and sequencing is in a single direction from that R primer. They looked at me like I was crazy.

I am going to put up a post about this on SEQanswers but would love to hear what you have to say about it.

You might take them the stack of papers that sequence in one direction and then look at them like they’re crazy :slight_smile:

Sadly, I tend to look at people like they’re crazy all the time, so it would not trigger any response.

In case anyone wants to follow this discussion at seq answers:

http://seqanswers.com/forums/showthread.php?t=16711

and a thread I should have found if I had searched first:

http://seqanswers.com/forums/showthread.php?t=11612