Make.contigs primer/barcode information

Hi everyone,

Firstly I’d like to apologise if I’m repeating some queries from previous posts. I looked through a lot through using the google search as recommended however I couldn’t exactly find the answer I was looking for. Secondly, apologies for perhaps the very simple questions I am completely new to this type of analysis/molecular work having previously only carried out DGGE, TRFLP and qPCR and running reactors!

I have a bacterial Illumina Miseq run with forward and reverse reads in fast.q files. I however need to remove four samples from this run as they belong to another project is there a way of doing this in Mothur or otherwise? I cannot seem to open the fast.q file in BioEdit perhaps there’s something else I can use to open the files to remove the samples I don’t want to include?

I also need to remove the sequence phiX and/or the sequence (CCTATCCCCTGTGTGCCTTGGCAGTCTCAG) needs to be trimmed after joining the paired reads. The primers used were 515f and 806r. I thought maybe I should use Trimmomatic or CutAdapt for this but I’m having innstallation trouble.

I went ahead in Mothur and used the make.contigs command to join the reads and there seems to be no problems when I look at the summary file apart from having a read of 500 there somewhere and some ambiguous base calls. I know I can use a trim seqs command I am wondering how to do this for the information I have given? I found this on the forum trim.seqs(fasta=current, oligos=yourOligos, pdiffs=2) - won’t create a group since you don’t have any names
list.seqs(fasta=current) - list trimmed sequences get.seqs(group=yourGroupFIle) - will make sure any sequences that were removed due to the primer are removed from the group file. Is it possible to use this command and if so at what point could I use this? Also what type of file should I be making to instruct Mothur of the sequence/barcode/primer information?

Having posted this I realise that from the make.contigs command it only produced the rim.contigs and scrap.contigs file and report file but there is no groups file?

Sorry for what are probably very basic and silly questions. Any help would be much appreciated.

Best wishes,

Ciara Keating (MEL lab in NUIG Ireland)

Welcome to the mothur community…

I have a bacterial Illumina Miseq run with forward and reverse reads in fast.q files. I however need to remove four samples from this run as they belong to another project is there a way of doing this in Mothur or otherwise? I cannot seem to open the fast.q file in BioEdit perhaps there’s something else I can use to open the files to remove the samples I don’t want to include?

So you need to include your barcode/primer information to get a group file. Then once you’ve completed make.contigs, you can run remove.groups specifying those four groups you want to remove.

I also need to remove the sequence phiX and/or the sequence (CCTATCCCCTGTGTGCCTTGGCAGTCTCAG) needs to be trimmed after joining the paired reads. The primers used were 515f and 806r. I thought maybe I should use Trimmomatic or CutAdapt for this but I’m having innstallation trouble.

The sequencing software should have removed the phiX and the CCTATCCCCTGTGTGCCTTGGCAGTCTCAG adapter sequence. Alternatively, for the adapter, you can use trimoverlap=T as an option in make.contigs to only include the region where your two reads overlap - this will remove any remaining adapter.

I went ahead in Mothur and used the make.contigs command to join the reads and there seems to be no problems when I look at the summary file apart from having a read of 500 there somewhere and some ambiguous base calls. I know I can use a trim seqs command I am wondering how to do this for the information I have given? I found this on the forum trim.seqs(fasta=current, oligos=yourOligos, pdiffs=2) - won’t create a group since you don’t have any names

Have you seen the MiSeq SOP page we have posted? Redirecting…. It would probably be good for you to go through this once with our data to see how things are supposed to work and then adapt it to your data.


list.seqs(fasta=current) - list trimmed sequences get.seqs(group=yourGroupFIle) - will make sure any sequences that were removed due to the primer are removed from the group file. Is it possible to use this command and if so at what point could I use this? Also what type of file should I be making to instruct Mothur of the sequence/barcode/primer information?

I’m not really clear what you’re asking…

Having posted this I realise that from the make.contigs command it only produced the rim.contigs and scrap.contigs file and report file but there is no groups file?

It will make a groups file if you give it your barcodes and primer sequences.


Pat

Hello Dr. Schloss

Thank you so much for responding to this post. Apologies for the delay in replying I was trying to make sure I’d tried my best at the analysis myself. I was able to complete the tutorial without many problems. However, I am completely stuck with my data. I have tried many things but I am still stuck at the make.contigs process.

Firstly, I tried “make.contigs(ffastq=SAM1-16_S1_L001_R1_001.fastq, rfastq=SAM1-16_S1_L001_R2_001.fastq, oligos=oligos.txt, trimoverlap=T, processors=8)”.

However, this still could not make group files and the process in Mothur seemed to just contain numbers unlike the example data. The oligos file I used had the f primer, reverse primer and barcode “letters” and Sample ID. I tried some variations of the oligos file to try change this with F primer, R primer, Barcode “letters” and NONE and Sample ID. This came back saying "barcodes must be paired unless you are using an index file. I tried with F primer, R primer, barcode “letters” “same letters” and Sample ID. This created a group file but contained only 74 groups.

I realise that maybe my problem is that I was using just the forward and reverse read? So I tried make a stability file with the Sample ID in Column 1, Name of forward read in Column 2, Name of reverse read in Column 3, Index file in Column 4.

“make.contigs(file=stabilityfive.file, oligos=oligos.txt, trimoverlap=T, processors=8)”

Using 8 processors.
Reading fastq data…
[WARNING]: can’t find R1.I.DNA, ignoring pair.
[WARNING]: can’t find R1.I.cDNA, ignoring pair.
[WARNING]: can’t find SAM1-16_S2_L001_R1_001.fastq, ignoring pair.
[WARNING]: can’t find SAM1-16_S2_L001_R2_001.fastq, ignoring pair.
[WARNING]: can’t find R1.S1.DNA, ignoring pair.
[WARNING]: can’t find SAM1-16_S3_L001_R1_001.fastq, ignoring pair.
[WARNING]: can’t find SAM1-16_S3_L001_R2_001.fastq, ignoring pair.
[WARNING]: can’t find R1.S1.cDNA, ignoring pair.
[WARNING]: can’t find SAM1-16_S4_L001_R1_001.fastq, ignoring pair.
[WARNING]: can’t find SAM1-16_S4_L001_R2_001.fastq, ignoring pair.
[WARNING]: can’t find R1.S2.DNA, ignoring pair.
[WARNING]: can’t find SAM1-16_S5_L001_R1_001.fastq, ignoring pair.
[WARNING]: can’t find SAM1-16_S5_L001_R2_001.fastq, ignoring pair.
[WARNING]: can’t find R1.S2.cDNA, ignoring pair.
[WARNING]: can’t find SAM1-16_S6_L001_R1_001.fastq, ignoring pair.
[WARNING]: can’t find SAM1-16_S6_L001_R2_001.fastq, ignoring pair.
[WARNING]: can’t find R1.S5.DNA, ignoring pair.
[WARNING]: can’t find SAM1-16_S7_L001_R1_001.fastq, ignoring pair.
[WARNING]: can’t find SAM1-16_S7_L001_R2_001.fastq, ignoring pair.
[WARNING]: can’t find R1.S5.cDNA, ignoring pair.
[WARNING]: can’t find SAM1-16_S8_L001_R1_001.fastq, ignoring pair.
[WARNING]: can’t find SAM1-16_S8_L001_R2_001.fastq, ignoring pair.
[WARNING]: can’t find R1.E.DNA, ignoring pair.
[WARNING]: can’t find SAM1-16_S9_L001_R1_001.fastq, ignoring pair.
[WARNING]: can’t find SAM1-16_S9_L001_R2_001.fastq, ignoring pair.
[WARNING]: can’t find R1.E.cDNA, ignoring pair.
[WARNING]: can’t find SAM1-16_S10_L001_R1_001.fastq, ignoring pair.
[WARNING]: can’t find SAM1-16_S10_L001_R2_001.fastq, ignoring pair.
[WARNING]: can’t find R1.FE.DNA, ignoring pair.
[WARNING]: can’t find SAM1-16_S11_L001_R1_001.fastq, ignoring pair.
[WARNING]: can’t find SAM1-16_S11_L001_R2_001.fastq, ignoring pair.
[WARNING]: can’t find R1.FE.cDNA, ignoring pair.
[WARNING]: can’t find SAM1-16_S12_L001_R1_001.fastq, ignoring pair.
[WARNING]: can’t find SAM1-16_S12_L001_R2_001.fastq, ignoring pair.
[WARNING]: can’t find R2.Day.531.DNA, ignoring pair.
[WARNING]: can’t find SAM1-16_S13_L001_R1_001.fastq, ignoring pair.
[WARNING]: can’t find SAM1-16_S13_L001_R2_001.fastq, ignoring pair.
[WARNING]: can’t find R2.Day.531.cDNA, ignoring pair.
[WARNING]: can’t find SAM1-16_S14_L001_R1_001.fastq, ignoring pair.
[WARNING]: can’t find SAM1-16_S14_L001_R2_001.fastq, ignoring pair.
[WARNING]: can’t find R3.Day.531.DNA, ignoring pair.
[WARNING]: can’t find SAM1-16_S15_L001_R1_001.fastq, ignoring pair.
[WARNING]: can’t find SAM1-16_S15_L001_R2_001.fastq, ignoring pair.
[WARNING]: can’t find R3.Day.531.cDNA, ignoring pair.
[WARNING]: can’t find SAM1-16_S16_L001_R1_001.fastq, ignoring pair.
[WARNING]: can’t find SAM1-16_S16_L001_R2_001.fastq, ignoring pair.
Done.

So I tried change the stability file to contain the forward fastq, reverse fast q, index filef, and NONE. This caused a different error.

make.contigs(file=stabilitysix.file, oligos=oligos.txt, trimoverlap=T, processors=8)

Using 8 processors.
Reading fastq data…
[WARNING]: can’t find SAM1-16_S2_L001_R1_001.fastq, ignoring pair.
[WARNING]: can’t find SAM1-16_S2_L001_R2_001.fastq, ignoring pair.
[WARNING]: can’t find SAM1-16_S3_L001_R1_001.fastq, ignoring pair.
[WARNING]: can’t find SAM1-16_S3_L001_R2_001.fastq, ignoring pair.
[WARNING]: can’t find SAM1-16_S4_L001_R1_001.fastq, ignoring pair.
[WARNING]: can’t find SAM1-16_S4_L001_R2_001.fastq, ignoring pair.
[WARNING]: can’t find SAM1-16_S5_L001_R1_001.fastq, ignoring pair.
[WARNING]: can’t find SAM1-16_S5_L001_R2_001.fastq, ignoring pair.
[WARNING]: can’t find SAM1-16_S6_L001_R1_001.fastq, ignoring pair.
[WARNING]: can’t find SAM1-16_S6_L001_R2_001.fastq, ignoring pair.
[WARNING]: can’t find SAM1-16_S7_L001_R1_001.fastq, ignoring pair.
[WARNING]: can’t find SAM1-16_S7_L001_R2_001.fastq, ignoring pair.
[WARNING]: can’t find SAM1-16_S8_L001_R1_001.fastq, ignoring pair.
[WARNING]: can’t find SAM1-16_S8_L001_R2_001.fastq, ignoring pair.
[WARNING]: can’t find SAM1-16_S9_L001_R1_001.fastq, ignoring pair.
[WARNING]: can’t find SAM1-16_S9_L001_R2_001.fastq, ignoring pair.
[WARNING]: can’t find SAM1-16_S10_L001_R1_001.fastq, ignoring pair.
[WARNING]: can’t find SAM1-16_S10_L001_R2_001.fastq, ignoring pair.
[WARNING]: can’t find SAM1-16_S11_L001_R1_001.fastq, ignoring pair.
[WARNING]: can’t find SAM1-16_S11_L001_R2_001.fastq, ignoring pair.
[WARNING]: can’t find SAM1-16_S12_L001_R1_001.fastq, ignoring pair.
[WARNING]: can’t find SAM1-16_S12_L001_R2_001.fastq, ignoring pair.
[WARNING]: can’t find SAM1-16_S13_L001_R1_001.fastq, ignoring pair.
[WARNING]: can’t find SAM1-16_S13_L001_R2_001.fastq, ignoring pair.
[WARNING]: can’t find SAM1-16_S14_L001_R1_001.fastq, ignoring pair.
[WARNING]: can’t find SAM1-16_S14_L001_R2_001.fastq, ignoring pair.
[WARNING]: can’t find SAM1-16_S15_L001_R1_001.fastq, ignoring pair.
[WARNING]: can’t find SAM1-16_S15_L001_R2_001.fastq, ignoring pair.
[WARNING]: can’t find SAM1-16_S16_L001_R1_001.fastq, ignoring pair.
[WARNING]: can’t find SAM1-16_S16_L001_R2_001.fastq, ignoring pair.
[WARNING]: reading >M01522:110:000000000-A4LP7:1:1112:12727:25543 expected a name with @ as a leading character, ignoring read.
[WARNING]: reading >M01522:110:000000000-A4LP7:1:2104:6417:17827 expected a name with + as a leading character, ignoring.[WARNING]: names do not match. read >M01522:110:000000000-A4LP7:1:1112:12727:25543 for fasta and >M01522:110:000000000-A4LP7:1:2104:6417:17827 for quality, ignoring.[WARNING]: reading >M01522:110:000000000-A4LP7:1:1111:14266:8949 expected a name with @ as a leading character, ignoring read.
[WARNING]: reading >M01522:110:000000000-A4LP7:1:1111:22516:16261 expected a name with + as a leading character, ignoring.[WARNING]: names do not match. read >M01522:110:000000000-A4LP7:1:1111:14266:8949 for fasta and >M01522:110:000000000-A4LP7:1:1111:22516:16261 for quality, ignoring.[WARNING]: reading >M01522:110:000000000-A4LP7:1:1113:14977:4917 expected a name with @ as a leading character, ignoring read.
[WARNING]: reading >M01522:110:000000000-A4LP7:1:1111:8257:15609 expected a name with + as a leading character, ignoring.[WARNING]: names do not match. read >M01522:110:000000000-A4LP7:1:1113:14977:4917 for fasta and >M01522:110:000000000-A4LP7:1:1111:8257:15609 for quality, ignoring.[WARNING]: reading >M01522:110:000000000-A4LP7:1:2106:24418:25790 expected a name with @ as a leading character, ignoring read.
[WARNING]: reading >M01522:110:000000000-A4LP7:1:1113:13459:23472 expected a name with + as a leading character, ignoring.[WARNING]: names do not match. read >M01522:110:000000000-A4LP7:1:2106:24418:25790 for fasta and >M01522:110:000000000-A4LP7:1:1113:13459:23472 for quality, ignoring.[WARNING]: Lengths do not match for sequence >M01522:110:000000000-A4LP7:1:2106:24418:25790. Read 282 characters for fasta and 300 characters for quality scores, ignoring read.[WARNING]: reading >M01522:110:000000000-A4LP7:1:2107:17079:21072 expected a name with @ as a leading character, ignoring read.
[WARNING]: reading >M01522:110:000000000-A4LP7:1:2106:18041:21540 expected a name with + as a leading character, ignoring.[WARNING]: names do not match. read >M01522:110:000000000-A4LP7:1:2107:17079:21072 for fasta and >M01522:110:000000000-A4LP7:1:2106:18041:21540 for quality, ignoring.[WARNING]: Lengths do not match for sequence >M01522:110:000000000-A4LP7:1:2107:17079:21072. Read 300 characters for fasta and 291 characters for quality scores, ignoring read.[WARNING]: reading >M01522:110:000000000-A4LP7:1:2101:15331:27248 expected a name with @ as a leading character, ignoring read.
[WARNING]: reading >M01522:110:000000000-A4LP7:1:2102:19304:20673 expected a name with + as a leading character, ignoring.[WARNING]: names do not match. read >M01522:110:000000000-A4LP7:1:2101:15331:27248 for fasta and >M01522:110:000000000-A4LP7:1:2102:19304:20673 for quality, ignoring.[WARNING]: reading >M01522:110:000000000-A4LP7:1:1111:7152:16001 expected a name with @ as a leading character, ignoring read.
[WARNING]: reading >M01522:110:000000000-A4LP7:1:2111:23748:26157 expected a name with + as a leading character, ignoring.[WARNING]: names do not match. read >M01522:110:000000000-A4LP7:1:1111:7152:16001 for fasta and >M01522:110:000000000-A4LP7:1:2111:23748:26157 for quality, ignoring.[WARNING]: reading >M01522:110:000000000-A4LP7:1:2106:16051:9557 expected a name with @ as a leading character, ignoring read.
[WARNING]: reading >M01522:110:000000000-A4LP7:1:2108:16030:10132 expected a name with + as a leading character, ignoring.[WARNING]: names do not match. read >M01522:110:000000000-A4LP7:1:2106:16051:9557 for fasta and >M01522:110:000000000-A4LP7:1:2108:16030:10132 for quality, ignoring.[WARNING]: Lengths do not match for sequence >M01522:110:000000000-A4LP7:1:2106:16051:9557. Read 301 characters for fasta and 300 characters for quality scores, ignoring read.[WARNING]: reading >M01522:110:000000000-A4LP7:1:2111:15388:9731 expected a name with @ as a leading character, ignoring read.
[WARNING]: reading >M01522:110:000000000-A4LP7:1:1105:5705:13404 expected a name with + as a leading character, ignoring.[WARNING]: names do not match. read >M01522:110:000000000-A4LP7:1:2111:15388:9731 for fasta and >M01522:110:000000000-A4LP7:1:1105:5705:13404 for quality, ignoring.[WARNING]: Lengths do not match for sequence >M01522:110:000000000-A4LP7:1:2111:15388:9731. Read 301 characters for fasta and 300 characters for quality scores, ignoring read.[WARNING]: reading >M01522:110:000000000-A4LP7:1:1109:23481:7800 expected a name with @ as a leading character, ignoring read.
[WARNING]: reading >M01522:110:000000000-A4LP7:1:1107:8980:15947 expected a name with + as a leading character, ignoring.[WARNING]: names do not match. read >M01522:110:000000000-A4LP7:1:1109:23481:7800 for fasta and >M01522:110:000000000-A4LP7:1:1107:8980:15947 for quality, ignoring.[WARNING]: Lengths do not match for sequence >M01522:110:000000000-A4LP7:1:1109:23481:7800. Read 300 characters for fasta and 282 characters for quality scores, ignoring read.[WARNING]: reading >M01522:110:000000000-A4LP7:1:2101:19362:13304 expected a name with @ as a leading character, ignoring read.
Which continued for so long I’d to quit the program!

I’m not sure where the errors could be coming from maybe my stability file or index file is wrong? I think part of the problem is I only have one R1 file and one R2 file from MR DNA containing all the files. But I don’t have individual fastq files. Are these something I need to make myself? The company carried out taxonomic analysis for us in QIIME so I have full fasta and qual files also. We needed to do some extra analysis which is why I’ve started working with the raw data in Mothur unfortunately nobody in our group has prior experience…

Your help is greatly appreciated, (or anyone else who will hopefully read this)

Best wishes,

Ciara Keating

Microbial Ecology Laboratory,
Microbiology Department
National University of Ireland, Galway
Ireland

I think you likely have some extra spaces in the file file. Also, hyphens are strongly encouraged in file names. Can you confer the hyphens in the file/sample names to underscores and try again?

Hi Pat,

Thank you so much for the response. Yes it appeared there were spaces in the file file. I changed this. I then changed the hyphens to underscore for the forward and reverse reads. I could not do this for the stability file as you see I have one forward read and one reverse read from the sequencing company. Sam1-16_L001_R1_001 and Sam1-16_L001_R2_001 and I cannot seem to see the individual forward and reverse read per sample? I’m sure it’s something incredibly silly that I am not realising.

My mapping file contains this information

#SampleID BarcodeSequence LinkerPrimerSequence BarcodeName ProjectName Description
R1.E.cDNA GAGTCACT GTGCCAGCMGCCGCGGTAA 515Fbar10 110713CK515F R1.E.cDNA
R1.E.DNA GAGTAGTG GTGCCAGCMGCCGCGGTAA 515Fbar9 110713CK515F R1.E.DNA
R1.FE.cDNA GAGTCTCA GTGCCAGCMGCCGCGGTAA 515Fbar12 110713CK515F R1.FE.cDNA
R1.FE.DNA GAGTCAGA GTGCCAGCMGCCGCGGTAA 515Fbar11 110713CK515F R1.FE.DNA
R1.I.cDNA GAGATCAG GTGCCAGCMGCCGCGGTAA 515Fbar2 110713CK515F R1.I.cDNA
R1.I.DNA GAGAGTGT GTGCCAGCMGCCGCGGTAA 515Fbar1 110713CK515F R1.I.DNA
R1.S1.cDNA GAGATGAC GTGCCAGCMGCCGCGGTAA 515Fbar4 110713CK515F R1.S1.cDNA
R1.S1.DNA GAGATCTC GTGCCAGCMGCCGCGGTAA 515Fbar3 110713CK515F R1.S1.DNA
R1.S2.cDNA GAGTACAG GTGCCAGCMGCCGCGGTAA 515Fbar6 110713CK515F R1.S2.cDNA
R1.S2.DNA GAGATGTG GTGCCAGCMGCCGCGGTAA 515Fbar5 110713CK515F R1.S2.DNA
R1.S5.cDNA GAGTAGAC GTGCCAGCMGCCGCGGTAA 515Fbar8 110713CK515F R1.S5.cDNA
R1.S5.DNA GAGTACTC GTGCCAGCMGCCGCGGTAA 515Fbar7 110713CK515F R1.S5.DNA
R2.Day.531.cDNA GAGTGACA GTGCCAGCMGCCGCGGTAA 515Fbar14 110713CK515F R2.Day.531.cDNA
R2.Day.531.DNA GAGTCTGT GTGCCAGCMGCCGCGGTAA 515Fbar13 110713CK515F R2.Day.531.DNA
R3.Day.531.cDNA GAGTGTCT GTGCCAGCMGCCGCGGTAA 515Fbar16 110713CK515F R3.Day.531.cDNA
R3.Day.531.DNA GAGTGAGT GTGCCAGCMGCCGCGGTAA 515Fbar15 110713CK515F R3.Day.531.DNA

I think maybe I’m making a mistake somewhere with the barcode information or the oligo or index file?

I’m wondering is there a way of working with the mapping file and full fasta and qual files from Mr. DNA instead?

Your help is really appreciated, as you can see I am a complete novice!

Best wishes,

Ciara

It would be easier to get MrDNA to give you an oligos file than for us to incorporate whatever they’re doing :slight_smile:

Can you send us your updated oligos file and your compressed fastq files via dropbox or google drive?

Pat (pdschloss@gmail.com)

Hi Pat,

Thank you so much for helping. I will send the files via Googledrive now. I will also respond to your email re the workshop now.

Best wishes,

Ciara