Analyzing hundreds of datasets simultaneously

Hi,
After working with MOTHUR using my own samples, I would like to try and work with 454 datasets available on the web (NCBI SRA, MG-RAST etc.). I have downloaded ~200 454 datasets; I could go through the Schloss SOP 200 times and then compare the results, but that would take a long time. Is there some way that can allow to do them all simultaneously? Would merging all of them into a single dataset cause memory problems? If not merging, perhaps there is a script that allows to run the 200 datasets one after the other automatically…?
Thanks!

I’m not a programer but have managed to run >500 separate datasets through the first few steps of the SOP using “for” and command line mothur

http://www.mothur.org/wiki/Command_line_mode

Interesting! This is immensely helpful.
Can you please post your entire script?
How are your datasets organized?
Thanks again…

this is it so far, my files are all in the folder that I’m in before running this. Also my bacterial sff start with B and euk E, hence the separate oligos files and commands. I’m kind of making this up as I go along, so no flames if it doesn’t work for you

for n in .sff; do mothur “#sff.info(file=$n, flow=T)”; done
for n in B
.flow; do mothur “#trim.flows(flow=$n, oligos=B.oligos, pdiffs=2)”; done
for n in E*.flow; do mothur “#trim.flows(flow=$n, oligos=E.oligos, pdiffs=2)”; done
for n in *.flow; do mothur “#shhh.flows(flow=$n, processors=2)”; done

Thanks a lot.
Pat, any other thoughts will be much appreciated… :slight_smile:

About all I can recommend is to ask your sequence provider to not split everything up like this :slight_smile: