Hi,
I’ve tried to use fastq.info to sort my original fastq sequences by barcodes.
AS the original amount is quite large and the sorting takse a long time I took the original dataset in 30 separate pieces and started a command on each one of them simultaneously. (hoping for a 30X win in calculation time)
so I started the following bash skript
./mothur "#fastq.info(fastq=okt2012_001.assembled.fastq, oligos=12oligos.tsv, pdiffs=1, bdiffs=0,checkorient=T)" &
./mothur "#fastq.info(fastq=okt2012_002.assembled.fastq, oligos=12oligos.tsv, pdiffs=1, bdiffs=0,checkorient=T)" &
....etc.
./mothur "#fastq.info(fastq=okt2012_030.assembled.fastq, oligos=12oligos.tsv, pdiffs=1, bdiffs=0,checkorient=T)" &
wait
I ran for several days and produced 30x8 fastq,fasta,qual output files (8 bardcoded samples samples).
It also produced ONE logfile with a content like this
mothur > fastq.info(fastq=okt2012_005.assembled.fastq, oligos=12oligos.tsv, pdiffs=1, bdiffs=0,checkorient=T)
10000
20000
... etc.
....
3748143
Output File Names:
okt2012_010.assembled.B10_ATCGCT.AGAGGT.CAACGCGARGAACCTTACC.GTCGTCAGCTCGTGTTGT.fastq
okt2012_010.assembled.B10_ATCGCT.AGAGGT.CAACGCGARGAACCTTACC.GTCGTCAGCTCGTGTTGT.fasta
okt2012_010.assembled.B10_ATCGCT.AGAGGT.CAACGCGARGAACCTTACC.GTCGTCAGCTCGTGTTGT.qual
okt2012_010.assembled.B17_ATCGCT.AGAACC.CAACGCGARGAACCTTACC.GTCGTCAGCTCGTGTTGT.fastq
okt2012_010.assembled.B17_ATCGCT.AGAACC.CAACGCGARGAACCTTACC.GTCGTCAGCTCGTGTTGT.fasta
okt2012_010.assembled.B17_ATCGCT.AGAACC.CAACGCGARGAACCTTACC.GTCGTCAGCTCGTGTTGT.qual
okt2012_010.assembled.B20_ATCGCT.GTGTAG.CAACGCGARGAACCTTACC.GTCGTCAGCTCGTGTTGT.fastq
okt2012_010.assembled.B20_ATCGCT.GTGTAG.CAACGCGARGAACCTTACC.GTCGTCAGCTCGTGTTGT.fasta
okt2012_010.assembled.B20_ATCGCT.GTGTAG.CAACGCGARGAACCTTACC.GTCGTCAGCTCGTGTTGT.qual
okt2012_010.assembled.B30_ATCGCT.GTCATG.CAACGCGARGAACCTTACC.GTCGTCAGCTCGTGTTGT.fastq
okt2012_010.assembled.B30_ATCGCT.GTCATG.CAACGCGARGAACCTTACC.GTCGTCAGCTCGTGTTGT.fasta
okt2012_010.assembled.B30_ATCGCT.GTCATG.CAACGCGARGAACCTTACC.GTCGTCAGCTCGTGTTGT.qual
okt2012_010.assembled.B32_CATGGT.GGTGTT.CAACGCGARGAACCTTACC.GTCGTCAGCTCGTGTTGT.fastq
okt2012_010.assembled.B32_CATGGT.GGTGTT.CAACGCGARGAACCTTACC.GTCGTCAGCTCGTGTTGT.fasta
okt2012_010.assembled.B32_CATGGT.GGTGTT.CAACGCGARGAACCTTACC.GTCGTCAGCTCGTGTTGT.qual
okt2012_010.assembled.B33_CATGGT.GACCAA.CAACGCGARGAACCTTACC.GTCGTCAGCTCGTGTTGT.fastq
okt2012_010.assembled.B33_CATGGT.GACCAA.CAACGCGARGAACCTTACC.GTCGTCAGCTCGTGTTGT.fasta
okt2012_010.assembled.B33_CATGGT.GACCAA.CAACGCGARGAACCTTACC.GTCGTCAGCTCGTGTTGT.qual
okt2012_010.assembled.B35_CATGGT.CTAACG.CAACGCGARGAACCTTACC.GTCGTCAGCTCGTGTTGT.fastq
okt2012_010.assembled.B35_CATGGT.CTAACG.CAACGCGARGAACCTTACC.GTCGTCAGCTCGTGTTGT.fasta
okt2012_010.assembled.B35_CATGGT.CTAACG.CAACGCGARGAACCTTACC.GTCGTCAGCTCGTGTTGT.qual
okt2012_010.assembled.O3018_GCCATT.GTCTCA.CAACGCGARGAACCTTACC.GTCGTCAGCTCGTGTTGT.fastq
okt2012_010.assembled.O3018_GCCATT.GTCTCA.CAACGCGARGAACCTTACC.GTCGTCAGCTCGTGTTGT.fasta
okt2012_010.assembled.O3018_GCCATT.GTCTCA.CAACGCGARGAACCTTACC.GTCGTCAGCTCGTGTTGT.qual
okt2012_010.assembled.fasta
okt2012_010.assembled.qual
okt2012_010.assembled.scrap.fastq
okt2012_010.assembled.scrap.fasta
okt2012_010.assembled.scrap.qual
[WARNING]: your sequence names contained ':'. I changed them to '_' to avoid problems in your downstream analysis.
mothur > quit()
So is such simultaneous execution of mothur messing with it’s logging function? It seems to missed logging of most of the commands and logged the start of one command and end of another into One single log-file. Are my results still reliable - is it OK to run mothur in such a way as I have done. If not can i get any other hint how to run large amount of mothur jobs in parallel if needed.