Sequence Length

gkuffel · May 2, 2017, 9:03pm

Hi everyone,

I have sequenced the v4 region of the 16S rRNA gene on a MiSeq with a 500 cycle kit (2 x 250 bp reads). Can anyone tell me why the majority of my read lengths are around 290 bases when I run summary.seqs? How is this possible if the reads are literally only 250 bp long?

Kendra · May 3, 2017, 1:31pm

Did you read through this thread? https://www.mothur.org/forum/viewtopic.php?f=1&t=4202&start=20

gkuffel · May 3, 2017, 3:12pm

Thank you. I just read through the whole thread and it was helpful however I am still confused. We use the Caporaso primers for the V4 region (515F & 806R). I follow the Illumina protocol “16S Metagenomic Sequencing Library Preparation” which utilizes a two-step PCR approach. I then follow the Mothur MiSeq SOP for my analysis. Should I be clipping adapters and primers before I begin the Mothur pipeline?

Kendra · May 3, 2017, 3:40pm

The Caporaso primers already have the adapters, why are you using the illumina protocol to add the adapters?

Do you mean that you use 515F and 806R but not the rest of the Caporaso primers? Are you using custom sequencing primers?

gkuffel · May 3, 2017, 5:09pm

Exactly. Specifically we use the following in the first PCR:

TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG [Illumina forward overhang] GTGCCAGCMGCCGCGGTAA [515F Caporaso et al]

GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG [Illumina reverse overhang] GGACTACHVGGGTWTCTAAT [806R Caporaso et al]

And in the 2nd PCR we add Illumina barcodes using the Nextera XT index 1 and 2 primers.

Kendra · May 5, 2017, 4:11pm

ok in this case you do need to trim the primers. Your length makes sense because you are sequencing the primers and part of the adapters.

BManil · May 8, 2017, 12:36pm

Hi everyone ,

I am using mothur for my illumina Miseq data (v3 kit 300bp paired-end read/ 341F-785R/ V3-V4 regions) when I run summary.seqs after make.contigs I had this :
the minimum was 297bp
the max was 543bp
the mean was 427bp

is it normal?! (it is supposed to be 300 pb?!!!) I am confused how can figure out the min length and max length in screen.seqs command.

thank you !

campenr · May 8, 2017, 2:34pm

You have sequenced a region of the 16s gene that spans from base number 341 to base 785 (from your primer names). If you take the difference of these sites you get 444bp which is around the length of the fragment you are getting so it all makes perfect sense.

The v3 kit from Illumina generates 600 bp of Paired End reads by generating a 300bp forward read and a 300bp reverse read. If your target region is larger than 300bp then these forward and reverse reads will not fully overlap and therefore your contig will be longer than 300bp.

This is a relevant read.

In terms of picking min/max lengths for the screen.seqs command there is no correct answer. Variation in the 16S gene means that its not unreasonable that sequences larger than your mean length are legitimate, but on average the longer sequences are usually trash. Pick a value maybe 10% greater/less than the mean as a starting point and compare to something like a 20% greater/less than the mean and see how that affects your results. I found once I was trimming too stringently and losing some archaeal sequences.

BManil · May 9, 2017, 9:28am

thank you campenr for the clarification, not confused anymore

jlawrence · May 30, 2019, 1:57am

Hello, I have some followup questions on this topic. I similarly have V4 .fastq data from a 2x250 MiSeq run and it seems I too need to remove the primers (my reads were also assembling to ~290). I have 28 .fastq files for 14 samples and am following the MiSeq SOP. My oligos.txt file looks like this currently:

primer GTGYCAGCMGCCGCGGTAA GGACTACNVGGGTWTCTAAT
# BARCODE none none JG01
# BARCODE none none JG02
…etc.

make.contigs(file=stability.files, oligos=oligos.txt, processors=4)

Now reads assemble to the expected 253 bp length instead of 292, but I still have a few questions:

Do I need to account for degeneracy (if that’s the correct term) in these primers by listing all the possible permutations in my oligos file? (e.g this post)
make.contigs() throws a warning: * [WARNING]: your oligos file does not contain any group names. mothur will not create a groupfile. *. How do I work around this if barcodes are already removed and each sample is split into forward and reverse fastq files? Without a groupfile, do all the sequences end up in the same group, even if stability.files has the sample IDs?
Could leaving the primers in previously be a cause for why my batch script was crashing during cluster.split() - perhaps the primers caused excessive unique sequences (per the blog post here)?

Appreciate any suggestions.

pschloss · May 30, 2019, 5:45pm

Would you mind starting a new threa to answer this question? The current thread is a few years old.

jlawrence · May 30, 2019, 11:57pm

Not at all, wasn’t sure which would be preferred

Topic		Replies	Views
how to select minimum and maximum length of reads from summary seqs Theory behind mothur	6	3249	March 24, 2017
Choosing length after first command line Commands in mothur	4	3018	September 15, 2014
Sequence length Theory behind mothur	26	9072	May 2, 2017
Interpretation of mothur summary.seqs Commands in mothur	6	562	March 31, 2024
How to decide maxlength? Theory behind mothur	4	4315	July 20, 2015

Sequence Length

Related topics