Costello data set example problems

Hi Pat,

I have been following the Costello data set and it has been great resource for processing my data.

I was wondering is the older pipeline (with the file names written in; versus all files as current) was still posted somewhere? I am having a difficult time with the new pipeline. I think the use of current is a great feature when processing samples from beginning to end but it is problematic when running each command one step at a time and processing multiple sample files at each step before moving on to the next step.

My general protocol is to test the appropriate command on one of my sequence files and then check the output. If this looks okay I build a batch file for that step/or command run the 14 other samples then make a histogram of what comes out, compare the percentage of the sequences lost and ask does this make sense.Doing this I can figure out what I have to tweak at each step, I can find any problems present in my dataset, and I understand what is happening at each step along the way.

Initially I tried merging all of files and then processing but then I lost the ability to discover the problems in an individual sample or run. I also discovered that for one of the samples the incorrect region was sequenced and trying to process the samples with the entire pipeline caused significant problems.

With the new format of the Costello dataset pipeline I am having a more difficult time trying figuring out where step the input files come from, etc and it has become a little more of a black box. ( I suppose that is often the case when programs are made more user friendly). I can go back to the manual and this has helped but it has it takes a lot longer for me to figure out what is going on at each step.

I like the current command as it makes for less typing when processing a single sample but I was hoping that the old pipeline could also please be posted or made available?

Thanks,
Emily

Hi Emily,

You make a very good point. We’ll roll it back in the next week or so…
Pat