About the determination of start and end of reference database


Hi there,

I am trying to determine the start and end of silva reference database by following the two tutorials.


For the first one, pcr.seqs program is used to generate the product based on the primer sequences. The product doesn’t include the primer sequences.

For the second one, no program is used. The product is manually generated and it does include the primer sequences.

Obviously, when the following align.seqs(fasta=product.fasta, reference=silva.bacteria.fasta) and summary.seqs(fasta=product.align) steps are performed, the products with or without primer sequences generated from the first approach or the second approach will lead to different results. (i.e., starting or end positions would be different).

Does this affect the downstream analysis such as pcr.seqs(fasta=silva.bacteria.fasta, start=start,end=end,keepdots=F,processors=8) and align.seqs(fasta=stability.trim.contigs.good.unique.fasta, reference=silva.v4.fasta)?

Which approach do you recommend? the product includes the primer sequences or doesn’t include the primer sequences?

Many thanks,



It’s probably best to remove the primer sequences since you won’t want those in your final analysis.




In the fastq files, the illumina adaptor overhang sequences have been eliminated but primer sequences are still present.

Would you like to provide some advice on whether the primer sequences should be removed from the fastq files before creating contigs?

Or just take into account the removal of primer sequences in the process of creating the customized reference alignment?

Many thanks,



In the make.contigs command there is an oligos option where you can give it a file that lists your primer sequences. If you do that then make.contigs will remove them. This is described on the wiki page for the command.