Choosing pdiffs value and # before reverse primer

Hello to all Mothur members,

Is value for pdiffs chosen according to the degenerate bases in oligos ?

I have 1 degenerate base in forward and 4 in reverse oligos. So am I correct to choose pdiffs=4 ?

secondly, I have demultiplexed sequences, but I saw that mothur needs forward primer, reverse primer and barcode in the oligos file (which is the required format). So I include imaginary barcode in the file but put #before that, so that I give correct format to mothur, and expect that mothur will not read the barcodes (because of # at starting in oligos file). Is this approach correct for me ?

I dont put # before reverse primer because I can clearly see reverse primer at the end of my sequences and I want to trim it. Am I correct to do this ?

Any suggestion will be helpful. Thanks in advance.

Richa

No, the pdiffs value has nothing to do with the number of degeneracies in the primer. pdiffs should be set to 2. What protocol was used to generate your sequences? If you use the Kozich method or the Caporasso method then there is no need for an oligos file in make.contigs.

pat

Hello Dr. Schloss,

I have used Caparasso method. I need oligos file at trim.seqs step, in order to remove the primers. Because adapters and barcodes has been removed by the sequencing facility itself but primer has to be removed. My primer contains degenerate bases and if I dont trim them, they will create unique sequences because of different primer sequences.

This is why I need oligos file. So is the approach that i use to make oligos file correct?

If you are using the Caparasso method, you should have three files. One with the index reads, one with the forward reads, and one with the reverse reads.

My library was prepred in the sequencing facility itself and I dont know the barcodes sequences that they have used for my samples. After sequencing they removed adapters and barcodes and sent me sequences with forward and reverse primers.

Looking forward,

Richa

I’m sorry, but based on what you’re telling me, it doesn’t sound like you are using the Caparasso method. You will want to get the complete files from your sequencing facility and find out what strategy they used to sequence the samples (a citation would be helpful).

Pat

Hi Dr. Schloss,


Sequencing facility used the same primer construct as is given in Caparasso et al, 2010 (Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample). They just replaced the V4 reverse primer part with the reverse primer specific for V5. This is beacuse my target is V4-V5. Adaper, likers, pad is exactly same.

They demultiplexed the sequences and removed adapters and barcodes (not the forward and reverse primers). So now I have to think that how I can prepare the oligos file for trim.seq step so that I can exclude the primers. Mothur needs oligo file in the format that contains forward, reverse primer sequnces and also the barode sequence. So since I dont have barcodes in my seq, I put # before an imaginary barcode. So that mothur gets the required format of oligo file but should not recognize barcode.
This is the format-
forward primer …
reverse primer…

imaginary barcode sequnce.

1)This is working for me. but I want to be sure if I am correct to use this approach ?

  1. My second question is, since I can not generate a group file at or after trim.seq step, how can I check the no. of sequences ‘‘each library’’ left after screen.seqs and after unique.seqs??

Looking forward,
Thanks

Let me try to clear a few things up about the oligos file, and the trimming process. No lines are required in the oligos file. The barcode line is not required. By adding the # character before the “imaginary barcode sequence” line mothur is ignoring all data on that line anyway. Here is a link to the oligos file page, http://www.mothur.org/wiki/Oligos_File. You may find it helpful with formatting issues.

  1. As far as your approach, are your primers paired or separate? Mothur does not currently allow for diffs in reverse primers that are not paired. The oligos file link above has format for both paired and non paired primers.

  2. Are all your sequences in the same file? If they are broken into separate fasta files by barcode, you can use the make.groups command, http://www.mothur.org/wiki/Make.group, with the fasta files.

mothur > make.group(fasta=yourFasta1.fasta-yourFasta2.fasta-yourFasta3.fasta, groups=A-B-C)
mothur > merge.files(input=yourFasta1.fasta-yourFasta2.fasta-yourFasta3.fasta, output=yourFull.fasta)
mothur > trim.seqs(fasta=yourFull.fasta, oligos=yourOligos.oligos, pdiffs=2, other parameter…)
mothur > list.seqs(fasta=current)
mothur > get.seqs(group=yourGroupFile, accnos=current) - removes sequences trimmed by trim.seqs from group file

Hi Westcott,

Thank you very much for explanation.

  1. Paired primer means primer for doing paired end sequencing, right ? If so then yes my primer are paired (similar to EMP primers).
  2. I have seperate fastq files of many samples . Their barcodes has already been removed but I can see the 20 bp primer at the two ends. I follow miseq SOP. Only problem is that after make.contigs, I do trim.seq to remove the forward and reverse primer at each end, and this step does not generate any group file as output. This is because I dont (can not) give any group name in the oligos file because as I said before that barcodes were already removed by sequencing facility. After this step I am not able to see the number of sequences left in each sample after each step.

I tried to follow your suggestion but could not get any group file after trim.seq.

mothur > make.contigs(file=stability.files, processors=4)
sample1 333333
sample2 125189
Total of all groups is 458522

Output File Names:
stability.trim.contigs.fasta
stability.contigs.report
stability.scrap.contigs.fasta
stability.contigs.group

mothur > trim.seqs(fasta=stability.trim.contigs.fasta, oligos=primer.oligos, pdiffs=2, flip=T)
[oligosfile…
‘‘Primer forward primer sequence reverse primer sequence’’ (Since barcodes are removed, no barcodes mentioned in oigos file)]

Output File Names:
stability.trim.contigs.trim.fasta
stability.trim.contigs.scrap.fasta

It did not generate any accnos file also, otherwise I could use get.seqs(group=yourGroupFile, accnos=current), as you suggested. :frowning:

Please give your further suggestion on how I can make group file after this :?:

Richa

In your stability.files file, you should have a list of fastq files. I assume these are separated by barcode from the sequencing center with the barcodes already removed. Here’s what you need to do to create group files for them.

for each file in the stability.files file:
fastq.info(fastq=individualSamples.fastq, fasta=t, qual=f)

Once you have a fasta file for each fastq file then you can run:

mothur > make.group(fasta=yourFasta1.fasta-yourFasta2.fasta-yourFasta3.fasta…, groups=A-B-C…)
mothur > make.contigs(file=stability.files, processors=4)
mothur > trim.seqs(fasta=stability.trim.contigs.fasta, oligos=primer.oligos, pdiffs=2, flip=T)
mothur > list.seqs(fasta=stability.trim.contigs.trim.fasta) - this command creates the accnos
mothur > get.seqs(group=yourGroupFile, accnos=current) - removes sequences trimmed by trim.seqs from group file you created above

Thanks for the help.