Removing Sequence Primers

Hello all,

I am new to MOTHUR and have just received a sequences from a MiSeq paired end run. The sequences had already been demultiplexed and had the barcode primers removed. However, the sequencing primers were not removed and I was given a file with the primers (515f and 806r) to the remove them. I am not sure how to do this.

I have tried using trim.seqs before and after make.contigs. When I use trim.seqs with my oligos file after make.contigs, I end up losing all of my sequences after the the filter.seqs step. When I use make.contigs followed by trim.seqs with my oligos file, Iater get a mismatch error after the unique.seqs step. I have also tried going from the make.contigs step through to the count.seqs step and then trimming the oligos but this causes MOTHUR to crash wen asked for a summary.

I’m not sure what is going on here, but any help would be greatly appreciated!

Thanks,

Jessie

1 Like

Are your sequences in separate files, two for each sample? Have you tried running this for each sample:

mothur > make.contigs(ffastq=yourForwardFileSample1, rfastq=yourReverseFIleSample1, oligos=yourOligosfile, pdiffs=2)
mothur > make.contigs(ffastq=yourForwardFileSample2, rfastq=yourReverseFIleSample2, oligos=yourOligosfile, pdiffs=2)

Then you can make a group file and merged fasta file:

mothur > make.group(fasta=yourFastaFileSample1-yourFastaFileSample2…, groups=sample1-sample2…)
mothur > merge.files(input=yourFastaFileSample1-yourFastaFileSample2…, output=merged.fasta)

You can then continue with the MISeq SOP, http://www.mothur.org/wiki/MiSeq_SOP.

mothur > summary.seqs(fasta=merged.fasta)
mothur > screen.seqs(fasta=current, group=yourGroupFile, maxambig=0, maxlength=275)

1 Like

Thank you for the fast reply. I did try that, but I still get an error about paired barcodes and primers. It looks like this:

mothur > make.contigs(ffastq=Acetate-T1-Anaerobic_S11_L001_R1_001.fastq,rfastq=Acetate-T1-Anaerobic_S11_L001_R2_001.fastq,oligos=miseq_oligos_sopformat.txt,pdiffs=2,processors=8)

Using 8 processors.
Reading fastq data…

119174
Done.

Processing Acetate-T1-Anaerobic_S11_L001_R1_001.0ffastatemp (file 1 of 1) <<<<<
[ERROR]: make.contigs requires paired barcodes and primers. You can set one end to NONE if you are using an index file.
Making contigs…
[WARNING]: your sequence names contained ‘:’. I changed them to ‘_’ to avoid problems in your downstream analysis.

mothur > make.contigs(ffastq=Acetate-T2-Anaerobic_S17_L001_R1_001.fastq,rfastq=Acetate-T2-Anaerobic_S17_L001_R2_001.fastq,oligos=miseq_oligos_sopformat.txt,pdiffs=2,processors=8)

Using 8 processors.
Reading fastq data…

I’ve tried formatting the oligos file in different ways but that does not seem to work either.

Thank you,

Jessie

Can you post your oligos file?

Hi,
I am having exactly the same problem. In my case, barcodes, indexes were removed by the sequencing facilities and I used degenerate primers, so I used all the possibles combinations, is that correct?. My oligo.files looks like this:

primer AAACTTAAAGGAATTGGCGG ACGGGCGGTGTGTGC
primer AAACTTAAAGGAATTGGCGG ACGGGCGGTGTGTAC
primer AAACTTAAAGGAATTGACGG ACGGGCGGTGTGTGC
primer AAACTTAAAGGAATTGACGG ACGGGCGGTGTGTAC
primer AAACTTAAATGAATTGGCGG ACGGGCGGTGTGTGC
primer AAACTTAAATGAATTGGCGG ACGGGCGGTGTGTAC
primer AAACTTAAATGAATTGACGG ACGGGCGGTGTGTGC
primer AAACTTAAATGAATTGACGG ACGGGCGGTGTGTAC
primer AAACTCAAAGGAATTGGCGG ACGGGCGGTGTGTGC
primer AAACTCAAAGGAATTGGCGG ACGGGCGGTGTGTAC
primer AAACTCAAAGGAATTGACGG ACGGGCGGTGTGTGC
primer AAACTCAAAGGAATTGACGG ACGGGCGGTGTGTAC
primer AAACTCAAATGAATTGGCGG ACGGGCGGTGTGTGC
primer AAACTCAAATGAATTGGCGG ACGGGCGGTGTGTAC
primer AAACTCAAATGAATTGACGG ACGGGCGGTGTGTGC
primer AAACTCAAATGAATTGACGG ACGGGCGGTGTGTAC

Is this correct or should I add “imaginary” barcodes? I tried to make group and the error is:
556091
Done.

Processing BFP2_S13_L001_R1_001.0ffastatemp (file 1 of 1) <<<<<
[ERROR]: make.contigs requires paired barcodes and primers. You can set one end to NONE if you are using an index file.
Making contigs…
[WARNING]: your sequence names contained ‘:’. I changed them to ‘_’ to avoid problems in your downstream analysis.

Thanks!!

This is a bug that will be fixed in the next release. The issue is caused because your oligos file does not contain any barcodes. As a workaround add the following line to your oligos file:

primer AAACTTAAAGGAATTGGCGG ACGGGCGGTGTGTGC
primer AAACTTAAAGGAATTGGCGG ACGGGCGGTGTGTAC
primer AAACTTAAAGGAATTGACGG ACGGGCGGTGTGTGC
primer AAACTTAAAGGAATTGACGG ACGGGCGGTGTGTAC
primer AAACTTAAATGAATTGGCGG ACGGGCGGTGTGTGC
primer AAACTTAAATGAATTGGCGG ACGGGCGGTGTGTAC
primer AAACTTAAATGAATTGACGG ACGGGCGGTGTGTGC
primer AAACTTAAATGAATTGACGG ACGGGCGGTGTGTAC
primer AAACTCAAAGGAATTGGCGG ACGGGCGGTGTGTGC
primer AAACTCAAAGGAATTGGCGG ACGGGCGGTGTGTAC
primer AAACTCAAAGGAATTGACGG ACGGGCGGTGTGTGC
primer AAACTCAAAGGAATTGACGG ACGGGCGGTGTGTAC
primer AAACTCAAATGAATTGGCGG ACGGGCGGTGTGTGC
primer AAACTCAAATGAATTGGCGG ACGGGCGGTGTGTAC
primer AAACTCAAATGAATTGACGG ACGGGCGGTGTGTGC
primer AAACTCAAATGAATTGACGG ACGGGCGGTGTGTAC
barcode none none myGroup

Mothur will remove the primers, and assign all your sequences to myGroup. You can disregard the group file mothur creates.

Dear Westcott,

Thank you very much for keeping the great forum. Wish you all a very happy new year.

This topic passed quite a long time ago but I would like to know how could I make the Oligofile?

My sequences are, the same as your questions, in separate files, two for each sample.

For instance: A folde recieved: VHT-100_L001-ds.07464fe87bb84036864cb045ac44c911/ VHT100_R1.fastq.gz and VHT100_R2.fastq.gz

I have an excel file with the Adapters and 2xIndex.

Kind regards,
Truong

The oligos file format can be found here, oligos file. You will want something like:

primer forwardPrimer reversePrimer
barcode forwardBarcode reverseBarcode sampleName
...

You will need to create a 4 column file file. For example:

VHT100_R1.fastq.gz VHT100_R2.fastq.gz forwardIndexFile reverseIndexFile

Then assemble your paired reads, assigning them to samples and removing primers using the make.contigs command.

mothur > make.contigs(file=yourFile, bdiffs=1, pdiffs=2, oligos=yourOligosFile)

Kindly,
Sarah

1 Like

Dear Sarah,

Thank you so much for a quick respone.
I have tried but there is an error, could you please have a look bellow?
mothur > make.file(inputdir=., type=fastq)

stability.files:
VHT76 VHT76_R1_001.fastq VHT76_R2_001.fastq

Oligo file was exported from excel:
VHT76_R1_001.fastq VHT76_R2_001.fastq ATTACTCG TATAGCCT

mothur > make.contigs(file=stability.files, bdiffs=1, pdiffs=2, oligos=vht76.oligos)

Using 4 processors.

Processing file pair /home/truong/Documents/mothur/metadata/test/VHT76_R1_001.fastq - /home/truong/Documents/mothur/metadata/test/VHT76_R2_001.fastq (files 1 of 1) <<<<<
[WARNING]: VHT76_R1_001.FASTQ is not recognized as a valid type. Choices are forward, reverse, and barcode. Ignoring VHT76_R2_001.FASTQ.
[WARNING]: ATTACTCG is not recognized as a valid type. Choices are forward, reverse, and barcode. Ignoring TATAGCCT.
[ERROR]: invalid oligos file, quitting.
Making contigs…

It took 1 secs to assemble 0 reads.

Kind regards,
Truong

Hi Truong,
The issue above is caused by the oligos file. The oligos file should not include file names. Do you have index files? If so, the file file is not correct either. If you only have one set of fastq files then you don’t need to create the file file. You can enter the file names directly into the make.contigs command as follows:

mothur > make.contigs(ffastq=VHT76_R1_001.fastq, rfastq=VHT76_R2_001.fastq, findex=nameOfForwardIndexFile, rindex= nameOfReverseIndexFile, bdiffs=1, oligos=vht76.oligos)

vht76.oligos file:

barcode ATTACTCG TATAGCCT VHT76

Kindly,
Sarah

1 Like

Hi Sarah,
It is very kind of you. I have done some steps as your comments above and following the guideline make.group
Have a nice weekend.
Cheers,
Truong

Hi Sarah,

Yes I have an excel file as table below.

Sample_ID I7_Index_ID index I5_Index_ID index2
VHT 76 D701 ATTACTCG D501 TATAGCCT
ReverseComplement 0
Adapter AGATCGGAAGAGCACACGTCTGAACTCCAGTCA
AdapterRead2 AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT

I have tried your command line it is not working.

mothur > make.contigs(ffastq=VHT76_R1_001.fastq, VHT76_R2_001.fastq, findex=findex, rindex=rindex, bdiffs=1, oligos=vht76_bacodes_test.oligos)
[WARNING]: VHT76_R2_001.fastq is not a valid parameter, ignoring.
The valid parameters are: ffastq, rfastq, ffasta, rfasta, fqfile, rqfile, file, oligos, findex, rindex, qfile, pdiffs, bdiffs, tdiffs, checkorient, align, allfiles, trimoverlap, match, mismatch, gapopen, gapextend, insert, deltaq, maxee, processors, format, ksize, seed, inputdir, and outputdir.
[ERROR]: If you provide use the ffastq, you must provide a rfastq file.

Using 4 processors.
[ERROR]: did not complete make.contigs.


But I have tried your instructions in previous it seems working.

Then you can make a group file and merged fasta file:

mothur > make.group(fasta=yourFastaFileSample1-yourFastaFileSample2…, groups=sample1-sample2…)
mothur > merge.files(input=yourFastaFileSample1-yourFastaFileSample2…, output=merged.fasta)

then continue with the
MISeq SOP, http://www.mothur.org/wiki/MiSeq_SOP .

make.contigs(file=stability.paired.files, oligos=vht76_bacodes_test.oligos, pdiffs=2)

Processing file pair VHT76_R1_001.fastq - VHT76_R2_001.fastq (files 1 of 1) <<<<<
Making contigs…
Done.
It took 84 secs to assemble 229298 reads.

Output File Names:
stability.paired.trim.contigs.fasta
stability.paired.scrap.contigs.fasta
stability.paired.contigs.report

screen.seqs(fasta=stability.paired.scrap.contigs.fasta, group=stability.paired.scrap.contigs.groups, maxambig=0, maxlength=275)

It took 2 secs to screen 229298 sequences, removed 229132.

It removed 99.92% of my sequence!!!

but I still continue further

I used: silva.nr_v138.align

summary.seqs(fasta=stability.paired.scrap.contigs.good.unique.align, count=stability.paired.scrap.contigs.good.count_table)

Using 8 processors.

	Start	End	NBases	Ambigs	Polymer	NumSeqs

Minimum: 1 33 1 0 1 1
2.5%-tile: 1 1231 3 0 1 5
25%-tile: 1 1250 3 0 1 42
Median: 13389 13425 6 0 3 84
75%-tile: 13404 13425 12 0 3 125
97.5%-tile: 13422 13425 42 0 4 162
Maximum: 13425 13425 58 0 6 166
Mean: 8039 8779 9 0 2

of unique seqs: 138

total # of seqs: 166

mothur > screen.seqs(fasta=stability.paired.scrap.contigs.good.unique.align, count=stability.paired.scrap.contigs.good.count_table, stability.paired.scrap.contigs.good.unique.summary, start=8039, end=8799, maxhomop=8)
[WARNING]: stability.paired.scrap.contigs.good.unique.summary is not a valid parameter, ignoring.
The valid parameters are: fasta, contigsreport, alignreport, summary, name, count, group, qfile, taxonomy, start, end, maxambig, maxhomop, minlength, maxlength, processors, criteria, optimize, seed, inputdir, outputdir, minoverlap, ostart, oend, mismatches, maxn, minscore, maxinsert, and minsim.

Using 8 processors.

It took 0 secs to screen 138 sequences, removed 138.

It stops at filter.seqs

mothur > filter.seqs(fasta=stability.paired.scrap.contigs.good.unique.good.align, vertical=T, trump=.)

Using 8 processors.
[ERROR]: stability.paired.scrap.contigs.good.unique.good.align is blank. Please correct.
Creating Filter…
[ERROR]: stability.paired.scrap.contigs.good.unique.good.align is blank. Please correct.
Error in reading your fastafile, at position -1. Blank name.
It took 0 secs to create filter for 0 sequences.

Could you send your input files to mothur.bugs@gmail.com so I can track down the issue for you?

1 Like

Hi Sarah, Many thanks for your offer. I just sent to you an email.
Cheers,
Truong