make.contigs vs trim.seqs using illumina

Hi there,

Our lab currently uses 454 for 16S rRNA sequencing, but we were recently given some 2*300bp illumina data to analyze, which i am having trouble using mothur to process. I have the files either multiplexed as XXXrun1 and XXXrun2 (with a mapping file as barcodes and primer are still attached). Or demultiplexed into each sample’s individual fastq (run1, run2 with barcodes and primers still attached). I have tried make.contigs and trim.seqs with this data, but everything ends up scraped or unable to find mate pairs.

I was wondering if any has experienced these problems or if i need to used an external program such as pandaseqs, to stitch the data before using mothur.

Below is an example of the fastq file from one sample (run1 and run2). This is the barcode the sequencing company used GAGATGAC (which i can see at the start of both paired sequences) and this is the forward primer GTGCCAGCMGCCGCGGTAA, which i can find after the barcode. As i said above, I have tried make.contigs (using an oligos file, modified from the mapping file)band trim.seqs with this data, but everything ends up scraped or unable to find mate pairs.

RUN1
@M02542:4:000000000-A7D3D:1:1101:16727:1039 1:N:0:3
GAGATGACGTGCCAGCCGCCGCGGTAATACAGAGGGTGCGAACGTTGCTCGGATTTACTGGGCGTAAAGCGCGTGTAGGCGGACTCGCAAGTCGGTTGTGAAATCCCTGGGCTTAACCTAGGAACTGCATCCGAAACTGCTTGTCTTGAGTAATGGAGAGGGTGGCGGAATTCCCGGTGTAGAGGTGAAATTCGTAGATATCGGGAGGAACATCAGTGGCGAAGGCGGCCACCTGGACATTTACTGACGCTGAGACGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCCGGTAGTCCC
+
CCCCCGGGGGGGDFGGG:FEGGGGGEGGFFFG-CEGGGGGGGGGGGGGFFFG@FGFGGGGGGFBFFFGGGGGGFGCE9FG:CEEEGFGBFGDGGGG@+3FFGGGGFGFF+DDGDGGGGGGG <<9FFGGGGAFF<FFGGFGGGCFCFGCCC=FFFFF<F7CGGEGGGEEEEGGFAFGG=CF9FEFFGGGG;FF9CG8:3;9CFGGEEDECCFFCFCCEGFGGC>) 4CEDGEEC>>+75F:F7<9=FFC9:5BCD>F=BB9>) -(24,.446099014:61?:)=25-5)))5>>1((44<5)

RUN2
@M02542:4:000000000-A7D3D:1:1101:15596:1142 2:N:0:3
GAGATGACGTGCCAGCCGCCGCGGTAATACGTAGGGTGCGAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGTGGTTCTGTAAGTCAGATGTGAAATCCGCGGGCTCAACCTGGGAACTGCGTTTGAAAGTACAAAACTAGAGTGTGGCAGAGGGGGGTGGAATTCCGCGTGTAGCAGTGAAATGCGTAGAGATGTGGAGGAACACCGATGGCGAAGGCAGCCCCCTGGGTCAACACTGACGCCCAGGCCAGAAAGCATGGGGAGTAAACAGGATCAGAAACCCCGGTAGTTA
+
CCBCCFFGGCCF@FGFFGCGGEFEG7FFGGGGFFGGGEGGGCFFGGGGFGGGG7<EFFGGFCFCGCGGGG@GECFDEGGGFDEGGGGGGGGFFGFCF,FECFFFGC8A=BFGEGGFGGGGBCB8FFDAGFEGGGCF,8=DAE9FFGGDGC:EGGGF::ACFGGGGGGD<9<CECGGC:=EFFG7@:<?FFFCGGC:C?B4=FFFFFG?D>?4C09(-(:73549044??F23(-23821>)9?3@F<(-((9(.((((-8:)(336>((8<(((…)).(()(.).))(,(1)(4,(-))

Any help much appreciated, claire

What version of mothur are you using?

I ran the following:

mothur > make.contigs(ffastq=…/…/temp/forward.fastq, rfastq=…/…/temp/reverse.fastq, oligos=…/…/temp/oligos)

With an oligos file that looks like:

barcode GAGATGAC GAGATGAC group1
primer GTGCCAGCMGCCGCGGTAA GTGCCAGCMGCCGCGGTAA

You posted two different fragments, so I changed the names to match so I could test whether mothur would find the barcodes and primers. Make.contigs was able to assemble the read.

Group count:
group1 1
Total of all groups is 1

Hi again,

Thank you for the quick response. I am currently using version 1.32.1 which is installed on my laptop (windows). I’ll try make.contigs again on the desktop computer in lab and keep you posted how this works!

Cheers, claire

Okay, i tried to run make.contigs with the ffastq (4662Arun1) and rfastq (4662Arun2) which is just one demultiplexed sample.

The command i tried was make.contigs(ffastq=4662Arun1.fastq, rfastq=4662Arun2.fastq, oligos=4662A.oligos, pdiffs=3, bdiffs=1)

My oligos file looked like:
barcode GAGATGAC GAGATGAC Group1
primer GTGCCAGCMGCCGCGGTAA GTGCCAGCMGCCGCGGTAA

I ended up with;

[WARNING]: did not find paired read for M02542:4:000000000-A7D3D:1:2119:9994:3045, ignoring. (This message was repeated many times over).

Done.

Processing 4662Arun1.0ffastatemp (file 1 of 1) <<<<<
Making contigs…
7
Done.
It took 50 secs to process 7 sequences.


Output File Names: 4662Arun1.trim.contigs.fasta 4662Arun1.scrap.contigs.fasta 4662Arun1.contigs.report 4662Arun1.contigs.groups

Are the names of the reads in the files matching? If you think they are, can you post a forward and reverse name pair mothur is not matching?

Apparently the names don’t match! For example one of the warning messages was [WARNING]: did not find paired read for M02542:4:000000000-A7D3D:1:2119:9615:10876, ignoring.

Searching my two fastq files: there is only a match in run1 (see below), not run2.

@M02542:4:000000000-A7D3D:1:2119:9615:10876 1:N:0:3
GAGATGACGTGCCAGCAGCCGCGGTAATACGTAGGGAGCAAACGTTGTCCGGATTTATTGGGCGTAAAGGGCTCGTAGGCGGTTCAACAAGTCGGTCGTGAAAGCCCGGGGCTCAACCCCGGGATGCCGGTCGAAACTGTTGTGACTAGAGTTCGGTAGAGGTGAGTGGAATTCTCGGTGTAGCGGTGGAATGCGCAGNTATCGAGAGGAACACCATTAGCGAAGGCGGCTCACTGGGCCGATACTGACGCTGAGGAGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCCCGTAGTCC
+
CCCCCGGGGGGGGGGGGFGGGGGGGGGGGGGGFCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGDGGGGGGGGGGGGGGGGGGGGGFGGGGDDGGGGGGGGGGGG:=CFEGGGGGGGGGGGGFFGGGGGGGGGFGGGGGEGGGGGGG7::FGG?FFFFGGDGGCEGGGGGF:F:FFBFGGGD#22CGGGGF>ECCFFGGFGGGGGGGGGGGEDE7EFDF6CCFGE:EDGGEFCEG8C9G**7>GEC@FBFG:;B38993:DFB?7>6C>?4>AFFFF>>:6>(9B
@M02542:4:000000000-A7D3D:1:2119:23026:10877 1:N:0:3
GAGATGACGTGCCAACCGCCGCGGTAATACGGAGGGGGTTAGCGTTGTTCGGAATTACTGGGCGTAAAGCGCACGTAGGCGGATTGGAAAGTAAGAGGTGAAATCCCAGGGCTCAACCCTGGAACTGCCTTTTAAACTCCCAGTCTTGAGTTCGAGAGAGGTAAGTGGAATTCCAAGTGTAGAGGTGAAATTCGTAGANATTTGGAGGAACACCAGTGGCGAAGGCGGCTTACTGGCTCGATACTGACGCTGAGGTGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCCCGTAGTCC
+
CACC@ECEFCCFEGGF@FEGGGGGGEFCEF@BFGGEC@6@FFCCG:@B@ECCB:FEEFFEE=C=@>B<F<@CBCGEFFEGG>@+8BF,5B,<A<??<FGGGDD<?FCFFGCCGGDD=BBFDE<><FGCCCB@3@8C<=@,@>B>CCGFCC=;>F:FGGD>>7FCFGGCAGBCCGF?:FGFFCBGCBCFGC,62FB<B3#148:E7C8EGCCC7CFFG@;88DDC8EGCGGGEC?6FFG8/8CCC+2C:>**49009>CD7>5*7(()–27>C)8:4)=C2=)26)5<>7)-4>>4

All the output files are empty. Just 7 sequences in the scrap.contigs file.

Once again, thank you for your insights on this matter.

Hi again,

I was just wondering does, this mean i cannot use the make.contigs command as the names of each read (in run1.file, run2.file) do not match?

Since the names do not match, mothur is assuming the reads are not from the same sequence. The make.contigs command will not assemble sequences that are not from the same read.

As an alternative we were provided with a fasta and fastq file of joined paired ends, with the barcodes removed (i am trying to get the answer on how this was done and with what software).

I decided to continue with this data starting at trim.seqs (i used parameters of qwindowaverage=30, qwindow=5, maxambig=0, maxhomop=8, minlength=100), since i could not use the make.contigs command. I then followed the rest of the Miseq SOP.

However, i have just got to the make.shared and classify.otus section. Looking at the taxonomy file i have ended up with 60,000 OTUs (at 98%, my matrix would not cluster at 0.03) which i find highly unlikely.

I am starting to wonder about quality of this data and if the joined contigs really overlap. I guess with that many unique sequences i am limited to phylotyping my data? I have searched many forums, is it usual for the names in raw reads xx.run1 and xx.run2 not to match? I also read the werner ISMEJ 2012 paper about using a single direction. I am wondering whether to do the analysis on half of the data from raw files.

Again, any thoughts on this much appreciated!