Demultiplexing only some part of fastq files

Hello Pat and Mothur team,
I have 26 pairs of fastq files and a different combination of barcodes in each one. Only one barcode combination within each R1-R2 pair is of my interest. The rest is from other projects that I am not involved with.
The following is my oligo file:
############################
$ cat oligos-map4mothur.tab
primer CAGCMGCCGCGGTAATWC CCGTCAATTCCTTTRAGGTT 519F926R
BARCODE ggtac aggaa BfleaM1
BARCODE ggtac gagact NC014M10
BARCODE ggtac cgattcc FOF012M11
BARCODE ggtac tctcaatc DS006M12
BARCODE ggtac gagtgg XfleaM2
BARCODE ggtac ccacgtc FfleaM3
BARCODE ggtac ttctcagc SfleaM4
BARCODE ggtac ctagg NOfleaM5
BARCODE ggtac tgctta SAC023M6
BARCODE ggtac gcgaagt FOF007M7
BARCODE ggtac aatcctat DS024M8
BARCODE ggtac atctg SAC004M9
BARCODE caacac aggaa SAC015M13
BARCODE caacac gagact NC036M22
BARCODE caacac cgattcc SAC001M23
BARCODE caacac tctcaatc SAC002M24
BARCODE caacac gagtgg DS027M14
BARCODE caacac ccacgtc DS021M15
BARCODE caacac ttctcagc DS015M16
BARCODE caacac ctagg DS016M17
BARCODE caacac tgctta DS040M18
BARCODE caacac gcgaagt DS022M19
BARCODE caacac aatcctat NC027M20
BARCODE caacac atctg NC028M21
BARCODE atcggtt aggaa SAC003M25
BARCODE atcggtt gagtgg SAC005M26
############################################

I ran make.contigs from mothur v.1.39.1 like the following:
mothur > make.contigs(file=flea.txt, oligos=oligos-map4mothur.tab, bdiffs=1, pdiffs=2, checkorient=t, processors=7)
It ran without errors. However, by what was printed on the screen, I am afraid it is considering the whole fastq content as my samples, instead of only the sequences containing the barcodes depicted in the oligo file for each sample.

############################################
It took 2327 secs to process 5327581 sequences.
Group count:
BfleaM1 120317
DS006M12 334566
DS015M16 10615
DS016M17 19446
DS021M15 9775
DS022M19 13734
DS024M8 295246
DS027M14 20810
DS040M18 7933
FOF007M7 321275
FOF012M11 261574
FfleaM3 204135
NC014M10 257191
NC027M20 14456
NC028M21 14559
NC036M22 10184
NOfleaM5 304042
SAC001M23 10311
SAC002M24 12102
SAC003M25 26277
SAC004M9 242164
SAC005M26 21569
SAC015M13 12304
SAC023M6 328508
SfleaM4 308685
XfleaM2 241776

Total of all groups is 3423554

Output File Names:
flea.trim.contigs.fasta
flea.trim.contigs.qual
flea.contigs.report
flea.scrap.contigs.fasta
flea.scrap.contigs.qual
flea.contigs.groups

[WARNING]: your sequence names contained ‘:’. I changed them to ‘_’ to avoid problems in your downstream analysis.
####################################################

Can anyone shed a light on how mothur would consider only the sequences with the target-barcodes (on the oligo file) within each fastq pair?


JFYI I've also tried running for one sample only and got the following error: ########################################################## mothur > make.contigs(ffastq=M1_S1_L001_R1_001.fastq.gz, rfastq=M1_S1_L001_R2_001.fastq.gz, oligos=oligos-map4mothur-M1.tab, bdiffs=1, pdiffs=2, checkorient=t, processors=7)

Using 7 processors.
x�vyK�b���٥sӪ��’%AR1��n�LI����bØѶ is in your forward fastq file and not in your reverse file, please remove it using the remove.seqs command before proceeding.
is in your forward fastq file and not in your reverse file, please remove it using the remove.seqs command before proceeding.
�l*���+=������V��u���g�55S�6v^r�ui�����C������csS%��!�i0�?�ttQTQ��j���1�"�S"!�<�J�8�5G�B$�X<'R��h is in your forward fastq file and not in your reverse file, please remove it using the remove.seqs command before proceeding.
p4�����"� is in your forward fastq file and not in your reverse file, please remove it using the remove.seqs command before proceeding.
�Yʖ͖f�r�덼�9x��I��$�G’m�t&���
~ is in your forward fastq file and not in your reverse file, please remove it using the remove.seqs command before proceeding.
�-2�J���i-$ is in your forward fastq file and not in your reverse file, please remove it using the remove.seqs command before proceeding.
��’ is in your forward fastq file and not in your reverse file, please remove it using the remove.seqs command before proceeding.
##########################################################

Here’s the oligo file for the M1 sample only:
$ cat oligos-map4mothur-M1.tab
primer CAGCMGCCGCGGTAATWC CCGTCAATTCCTTTRAGGTT 519F926R
BARCODE ggtac aggaa BfleaM1
BARCODE tgattgac tctcaatc Blank
BARCODE ggtac tctcaatc D1
BARCODE caacac tctcaatc D13
BARCODE atcggtt tctcaatc D25

Looking forward to hear from you,
Thanks very much,
Elton

You can set barcode names to ‘ignore’. When the group is ignore, mothur will not include them in the results. For example, the following would only add sequences from NC014M10 and DS006M12 to the *.contigs file.

primer CAGCMGCCGCGGTAATWC CCGTCAATTCCTTTRAGGTT 519F926R
BARCODE ggtac aggaa ignore
BARCODE ggtac gagact NC014M10
BARCODE ggtac cgattcc ignore
BARCODE ggtac tctcaatc DS006M12
BARCODE ggtac gagtgg ignore

Thanks very much, “westcott”.
Nevertheless, my case is still a bit complicated, cause I may have a same combination of barcodes on two or more R1-R2 pairs of fastq files, but just for one specific R1-R2 fastq I want to grab it, the rest can be ignored.
This is because it is a two-step tailed amplicon approach from Nextera/Illumina:
Barcodes from first step (iNextF A-H, iNextR 1-12) = 96 combinations for multiplexing
Barcodes from second step (i5 A-H and i7 1-12) = 96 combinations for multiplexing
So, 96 x 96 = 9,216 samples that can be differentially tagged.
The first demultiplexing step (getting rid of i5 and i7) is done automatically by the sequencer software during the fastq files generation.
Therefore, the user has to handle the second demultiplexing step only, which is recommended to be performed on a one-by-one fastq pair way. Because, for instance, iNextF_A-iNextR_1 will be represented in different R1-R2 fastq files corresponding to different samples.

I have already figured out how to perform that task with split_libraries.py QIIME script under a bash “for” loop for each fastq pair.
However, since I attended the last mothur workshop and was amazed with the “behind the scenes” strategy of make.contigs, I am strongly considering that make.contigs will give me more reliable contigs/amplicons at a single-base quality level.

QUESTIONS:
Will I have to run make.contigs independently for each fastq pair or there is still a way of doing that in a batch with the files file?
If the former, what would be your recommendation formerging all the .contigs files (and probably the other outputs from make.contigs) into a single one within the mothur environment?
I’m afraid a simple bash cat might not work.

Thanks for your support,
I hope I have made myself clear,
Best,
Elton

Here’s my for loop for running make contigs on a bunch of fastq separately. I wrote this when we were only using one index, but you can easily adapt.

### When combining multiple sequencing runs, you have to run make.contigs on each run individually.
###
### This for loop will work for samples where the files are names like
###  9_2_14_Undetermined_S0_L001_R1_001.fastq
###  9_2_14_Undetermined_S0_L001_R2_001.fastq
###  9_2_14_Undetermined_S0_L001_I1_001.fastq
###  9_2_14.oligos
###
for o in *.oligos; do 
  i=`basename $o .oligos`_Undetermined_S0_L001_I1_001.fastq; 
  f=`basename $o .oligos`_Undetermined_S0_L001_R1_001.fastq; 
  r=`basename $o .oligos`_Undetermined_S0_L001_R2_001.fastq; 
  mothur "#make.contigs(processors=16, ffastq=$f, rfastq=$r, findex=$i, oligos=$o)"; 
 done

Hey kmitchel,
Thanks a lot!
Theoretically, your for loop seems to work fine on my data too.
I was busy on other stuff the entire week.
Let me put my hands on mothur next Monday and I’ll tell you about my practical result.

Thanks again,
Best,
Elton

Stopping by just to say that it worked well!

–> My for loop was like that:
$ for o in ls oligos-prep/oligos_*; do f=Mecho $o | sed 's/.*M//g' | sed 's/\.tab//g'_Secho $o | sed 's/.*M//g' | sed 's/\.tab//g'_L001_R1_001.fastq; r=Mecho $o | sed 's/.*M//g' | sed 's/\.tab//g'_Secho $o | sed 's/.*M//g' | sed 's/\.tab//g'_L001_R2_001.fastq; mothur “#make.contigs(ffastq=$f, rfastq=$r, oligos=$o, bdiffs=1, pdiffs=2, checkorient=t, processors=6)”; done

->The oligos-prep/ dir contains the following files:
oligos_BfleaM1.tab
oligos_DS006M12.tab
oligos_DS015M16.tab
oligos_DS016M17.tab
oligos_DS021M15.tab
oligos_DS022M19.tab
oligos_DS024M8.tab
oligos_DS027M14.tab
oligos_DS040M18.tab
oligos_FfleaM3.tab
oligos_FOF007M7.tab
oligos_FOF012M11.tab
oligos_NC014M10.tab
oligos_NC027M20.tab
oligos_NC028M21.tab
oligos_NC036M22.tab
oligos_NOfleaM5.tab
oligos_SAC001M23.tab
oligos_SAC002M24.tab
oligos_SAC003M25.tab
oligos_SAC004M9.tab
oligos_SAC005M26.tab
oligos_SAC015M13.tab
oligos_SAC023M6.tab
oligos_SfleaM4.tab
oligos_XfleaM2.tab

-> And my fastq pairs are named like the following:
M1_S1_L001_R1_001.fastq
M1_S1_L001_R2_001.fastq
M2_S2_L001_R1_001.fastq
M2_S2_L001_R2_001.fastq
M3_S3_L001_R1_001.fastq
M3_S3_L001_R2_001.fastq
… so on until M26_S26.*fastq

Checking whether all the contig names are unique

$ wc -l *.groups
$ grep -c ‘>’ *trim.contigs.fasta
$ grep -c ‘>’ *trim.contigs.fasta | sed ‘s/:/\t/g’ | cut -f 2 | awk ‘{sum += $1} END {print sum}’
$ cat *.groups | cut -f 1 | sort -u | wc -l
#–> YEAH! Good to go!

–> Then I moved all the outputs from the make.contigs individual runs to a new dir and merged the files into a main one that can be used on further mothur commands:
$ mkdir mkctg-individual-outputs
$ mv .contigs. mkctg-individual-outputs/
$ cat mkctg-individual-outputs/.groups >fleas.contigs.groups
$ cat mkctg-individual-outputs/
.report >fleas.contigs.report
$ cat mkctg-individual-outputs/.scrap.contigs.fasta >fleas.scrap.contigs.fasta
$ cat mkctg-individual-outputs/
.scrap.contigs.qual >fleas.scrap.contigs.qual
$ cat mkctg-individual-outputs/.trim.contigs.qual >fleas.trim.contigs.qual
$ cat mkctg-individual-outputs/
.trim.contigs.fasta >fleas.trim.contigs.fasta


Thanks guys, Best, Elton