DEMULTIPLEXING MISEQ PAIRED READS

Hello,

I am very new with Illumina sequencing and also with Mothur. I just received my sequences back from the sequence facility after being run in Miseq. They sent me four files:

  1. Index file: Undetermined_S0_L001_I1_001.fastq.
  2. R1: Undetermined_S0_L001_R1_001.fastq
  3. R2: Undetermined_S0_L001_R2_001.fastq
  4. mapping file with barcodes with the header: #SampleID BarcodeSequence LinkerPrimerSequence Description

I know I have to demultiplex but I am not sure if there is a command in Mothur. I think it could be done with trim.seqs, but I am not sure how my oligos file should look like. I don´t really understand what the index file means.

Any suggestions?

Thank you!

Please see the wiki page for make.contigs and then let me know where you’re at with your data.

Pat

HI Pat,

Following Miseq SOP make.contigs, I used the following command:
make.contigs(ffastq=Undetermined_S0_L001_R1_001.fastq, rfastq=Undetermined_S0_L001_R2_001.fastq, findex=Undetermined_S0_L001_I1_001.fastq, oligos=epibiont.oligos, processors=7)

My oligos file that I created looks like this:

primer GTGTGCCAGCMGCCGCGGTAA CCGGACTACHVGGGTWTCTAAT
barcode CATGTCTTCCAT NONE SPAK11R
barcode GTTACAGTTGGC NONE SPAK11P
barcode CGGACTCGTTAC NONE SPAK13R
barcode TCTCGCACTGGA NONE SPAK13P
barcode TTCTGGTCTTGT NONE SPAK24bR
barcode GTCCACTTGGAC NONE SPAK23P

The first line corresponding to the forward and reverse primer according to the Caporaso paper. I also added the primer linkers in the sequences (the first CC and GT), I am not sure if that might have created the problem.

I just have the forward Index file, this is why I put NONE in the third column.

After mothur processed everything it created 4 files (see below), but I see some problems:


1. Undetermined_S0_L001_R1_001.trim.contigs.fasta--> this file is empty 2. Undetermined_S0_L001_R1_001.scrap.contigs.fasta--> I looks like this: >M02149_86_000000000-A94DF_1_1101_15116_1417 | bf GGAGAAAAGGGGAAAAGAGAGGAAGGAAGAGGAAAGAGGAAAGAAGAGGGGAGAGGGGAAAGGAGGAGAGAGAAGGAGGAAAAAAAGGGGAGAAGGAGGAAAAAAGGAAAAAAAAGGAAGAGGAGGAGAGAAAGAGAGGGGAGAAAAAAGG-TCCTTCTTTTTCTCTCTTTTTTCTTCTTTCCTTTTCTTCCCTCTCTCTTCTTCTTTTTTTTCCTTCTTTTTTTCCCTCCCTTTTCTCCCCCCTTTTCCTTCCTTTTCTCCTTTTTTTCTTTCTTTTTTTCTTTTTTTTTTTCTTTCCTTTT >M02149_86_000000000-A94DF_1_1101_15482_1526 | bf GGGGAAAATGGGAGGAGAGAGTATGGTAGAGGGGGGAGGAAAGCCAGGGGTAGAGGTGAAATGAGGAGAGAGGAGGAGGAAAAACGGTGGAGAAGGAGGAAAAAAGGAAGAAAAATGAAGAGGAGGAGAGAAAGAGAGGGGAGAAAAAAGG-TCCTTCTTTTTCTCTCTTTTTTCTTCTTTCCTTTTCTTCCCTCTCTCTTCTTCTTTCTTTTCCTTCTTTTTTTCCCTCCCCCTTCTTCCCCTTTTTACTTCATTTTCTACTTTTTTTCTTTCTTCTCTTCTCTTTTGTTTTCATTCCTTTT >M02149_86_000000000-A94DF_1_1101_16320_1606 | bf TACGTAGGTTGCAAGCGTTCTCCGGATTTACTGGGCGTAAAGCGTCTGTAGGCGGTTTAATAAGTCTGCTGTTAAATCCTTTGGCTCAACCTCAAAATTGCATTNGAAACTNTTNGACTAGAGTATAGTAGAGGTAAAGGGAATTNCNAGTGGAGCGGTGAAATGCGTAGAGATAGGGAAGAACACCAAGGGCGAAGGCAGTTTAATGGGCTAATACTGACGCTGAGGGACGAAAGCGAGGGTAGCAAATAGG >M02149_86_000000000-A94DF_1_1101_16161_1623 | bf TACGGAGGATCCAAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGTTTGATAAGTTAGAGGTGAAATACCGGGGCTCAACTCCGGAACTGCCTCTAATACTGTTGAACTAGAGAGTAGTTGCGGTAGGCGGAATGTATGGTGTAGCGGTGAAATGCTTAGAGATCATACAGAACACCGATTGCGAAGGCAGCTTACCAAACTATATCTGACGTTGAGGCACGAAAGCGTGGGGAGCAAACAGG >M02149_86_000000000-A94DF_1_1101_16370_1635 | bf TACAGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCGCGTAGGTGGTTCGTTAAGTTGGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATTCAAAACTGACGAGCTAGAGTATGGTAGAGGGTGGTGGAATTTCCTGTGTAGCGGTGAAATGCGTAGATATAGGAAGGAACACCAGTGGCGAAGGCGACCACCTGGACTGATACTGACACTGAGGTGCGAAAGCGTGGGGAGCAAACAGG >M02149_86_000000000-A94DF_1_1101_16681_1642 | bf TACGGAGGGGGCTAGCGTTGTTCGGAATTACTGGGCGTAAAGAGTGCGTAGGCGGTTTAGTAAGTTGGAAGTGAAAGCCCGGGGCTTAACCTCGGAATTGCTTTCAAAACTACTAATCTAGAGTGTAGTAGGGGATGATGGAATTCCTAGTGTAGAGGTGAAATTCTTAGATATTAGGAGGAACACCGGTGGCGAAGGCGGTCATCTGGGCTACAACTGACGCTGATGCACGAAAGCGTGGGGAGCAAACAGG
  1. Undetermined_S0_L001_R1_001.contigs.report–> I just see the heathers: Name Length Overlap_Length Overlap_Start Overlap_End MisMatches Num_Ns, but no data.
  2. Undetermined_S0_L001_R1_001.contigs.groups–> it is also empty.

I am not sure where the error is. Maybe the oligos file, the primers are not correct?

Thanks!

Can you try putting the NONE in the second column instead of the third? I think they might actually be the reverse index. If that doesn’t work, could you send me a link to where I can get copies of your files from dropbox or google drive?

Pat

Hi Pat,

I did what you suggested but got the same problem. I sent the files to your email.

Thanks!

Hi mrm,
I am facing the same problem. Have you solved it? If you have, could you give me some advice?

I am having the same problem too! Did you get your sequences from Argonne National Laboratories? I wanted the data demultiplexed and in two fastq files but they said they are “unable” to do so. Any suggestions on how to get around this would be greatly appreciated.

thanks

I too had this issue, and I also have my samples sequenced at Argonne. I have a colleague who de-multiplexed my data for me. I’ve been trying to get the script so that I can do this myself in the future.

The technician I was working with told me that I should learn Qiime because they format their data for that platform. Which is really irritating. I should have the right to choose which analysis platform I want to without all this hassle.

Hi,

Yes, I had my samples sequenced at Argonne National Laboratories. It is very irritating because I have seen in an old thread that you can demultiplex offline, but the technician I worked with told me they could not do it.

He finally accepted to demultiplex the files for me but he used Qiime, instead of the Miseq software so he provided me with independent R1 and R2 fasta files, instead of fastq files. When I try to run make.contigs it gives me two errors:

reading >A1_0 expected a name with @ as a leading character, ignoring read.
[WARNING]: reading >A1_62 expected a name with + as a leading character, ignoring.[WARNING]: names do not match. read >A1_0 for fasta and >A1_62 for quality, ignoring.[WARNING]: reading >A1_0 expected a name with @ as a leading character, ignoring read.
[WARNING]: reading >A1_63 expected a name with + as a leading character, ignoring.[WARNING]: names do not match. read >A1_0 for fasta and >A1_63 for quality, ignoring.[WARNING]: reading >A1_77 expected a name with @ as a leading character, ignoring read.
[WARNING]: reading >A1_84 expected a name with + as a leading character, ignoring.[WARNING]: names do not match. read >A1_77 for fasta and >A1_84 for quality, ignoring.[WARNING]: reading >A1_78 expected a name with @ as a leading character, ignoring read.
[WARNING]: reading >A1_85 expected a name with + as a leading character, ignoring.[WARNING]: names do not match. read >A1_78 for fasta and >A1_85 for quality, ignoring.[WARNING]: reading >A1_90 expected a name with @ as a leading character, ignoring read.

It looks like it does not find the other read and also it is looking for a quality file, which we don´t have.

stevewhitem, would you mind sharing with us the file format that you got after your friend demultiplexed for you and the commands you used afterwards. THAT WOULD BE SUPER HELPFUL!

THANKS!!

Fenduoduo, Sorry I have not solved it. Pat asked me to sent him the files but he has not respond yet. I will post a response if he does.

Sorry for the delay. We seem to have a small bug in how we process the reverse index reads. Until the next release comes out (end of summerish), here’s what you need to do…

First, you don’t need the primer line in the oligos file since the sequencing primers aren’t actually in the sequence data.
Second, you need to use the reverse complement of the barcode sequence and put it in the second column of the oligos file (not the third).
Third, when you run make.contigs, use findex, not reindex.

So the command would look like this:
mothur “#make.contigs(ffastq=r1.fastq, rfastq=r2.fastq, findex=indices.fastq, oligos=m1.rc_oligos)”

If you have an oligos file that has the Caporasso reverse indices in the third column, the following R code will get you the correct formatting (assuming you’ve removed the primer line):

reverseComp <- function(x) {
 rev.x <- paste0(rev(unlist(strsplit(x, ""))), collapse="")
 rev.x <- chartr("ATGC", "TACG", rev.x)
 return(rev.x)
}

a <- scan(file="my.oligos", sep="\n", what="")
barcodes <- matrix(unlist(strsplit(a, "\t")),ncol=4, byrow=T)
barcodes[,2] <- unlist(lapply(barcodes[,3], reverseComp))
barcodes[,3] <- rep("NONE", nrow(barcodes))

write.table(file="my.rc_oligos", barcodes, row.names=F, col.names=F, quote=F)

The “my.rc_oligos” should be replaced with the name of your oligos file.

Just FYI for everyone the correct start and end positions for pcr.seqs and screen.seqs if you are using argonne national labs and the earth microbiome primers are start=13862, end=23444.
Quick edit: I found its best to use optimize=start-end for screen.seqs instead of the alignment positions.

Greetings
Sorry to jump on this old forum. I have the same problem with Miseq data from Argonne. Even with suggestions from pat schloss (using reverse complement of barcode sequence in second column and findex), I cant seem to get the make contigs command work. As with initial post, when I used rindex with reverse complement of barcode sequence in second column, all the sequences ended up in scrap.contigs files. I would appreciate some help in getting started with the data analysis.
Thanks
Vijai

Hi,

you say that primer sequences are not in our sequence data (with Caporasso strategy) however I found them in the data set, after make.contigs command. Hope you can help me understand. Thanks!

Génesis

Maybe this has already been posted, but I got this idemp fix off of github and I tried it yesterday. it worked great.

It takes a I1, R1, and R2(optional) file and splits them into individual samples. It works with either the gunzipped or text fastq files. the output was in the gunzip format, but that’s easy to remedy.
It also requires a “barcodes” file, but it’s simple to make if you have a QIIME mapping file. Similar to an “oligos” file It’s just a tab separated text file in the following format:

Barcodes Sampleid

ATGT… Sample1
GTAC… Sample2

etc…

The staff who did my sequencing at Argonne NL sent me this link.

Just in case anyone comes across this thread like I did and wonders whether Pat’s advice from 2014 still applies, it does. Worked like a charm for me.

Cheers,
Beth

Did anyone solve this issue, because I am experiencing the same problem still.
Thanks!

I am also struggling with this, as I am trying (and failing) to help a couple people demultiplex their data using mothur. Some practical advice from the experts would be greatly appreciated.

To be specific, I would like to go from the _R1.fastq, _R2.fastq and _I1.fastq to demultiplexed fastq files for the forward/reverse reads, eg. _SamA_R1.fastq, _SamA_R2.fastq, _SamB_R1.fastq, _SamB_R2.fastq, … without merging the reads.

This is doable using other tools (eg. QIIME/idemp) but I would love to be able to provide a clear workflow to folks who prefer mothur.

Have you followed the instructions from earlier in the thread? What are you still not able to do?