Sequence length

Egansbay · April 9, 2016, 8:07am

So, my seqs have been trimmed to 251bp length before proceeding to Mothur analysis. FastQC confirms the length at 251bp.

However after make.contigs and then summary.seqs, I see that the majority of seqs are listed in the table as being 292bp length.

Might be interpreting something incorrectly, but could somebody please help me on this one?

Thank you

Kendra · April 11, 2016, 8:46pm

They aren’t fully overlapped. The first and last ~25bp of your sequences aren’t overlapping.

Egansbay · April 12, 2016, 7:45am

Thanks kmitchell

So even though all seqs have been trimmed to 251bp, if they do not overlap they’ll show up in the summary table at 292bp?

pschloss · April 12, 2016, 2:09pm

The amplicon is likely ~292 nt long. make.contigs makes the contigs, so if you have 2 250 nt fragments it will stagger them to optimize the sequence conservation in the overlapping area.

Pat

Egansbay · April 12, 2016, 7:26pm

Thanks Pat

So even though these are 2 x 251bp v4 reads, the amplicon is staggered and showing up as 292bp?

Kendra · April 13, 2016, 3:11pm

Correct, the only relationship that should be true for sequencing on miseq/hiseq is that your amplicon or fragment needs to be longer than the sequence length. You could technically sequence full length 16s on a 2x25 kit-you wouldn’t get useful information out of that but the machine would still give you gigs of >q30 sequences. Illumina has decent videos on youtube that may help you understand what’s going on with paired end sequencing.

For diverse amplicons you should be aiming for a lot of overla, read any of the posts about the subpar results people are getting with v1-3 sequencing on miseq to get an understanding of why non-overlapping q30 sequences are not good for amplicons.

Egansbay · April 13, 2016, 3:28pm

Thanks kmitchell

I trimmed my seqs (f and r) to 251bp as the reverse read was showing a drop in quality (when screened with fastqc) after 251bp.

Do you think that my seqs have failed to merged correctly id they’re showing up at 292bp?

Egansbay · August 11, 2016, 5:17pm

Just wondering if anybody can help with the last question as to whether my seqs have merged correctly if they’re showing up as 292bp length for a 251 PE reads?

Thanks

pschloss · August 15, 2016, 12:23pm

I suspect that your contigs include your barcodes and primers. If you follow our wetlab SOP you don’t get separate files for hte barcodes or even sequene the primers.

Egansbay · August 15, 2016, 2:23pm

Thanks Pat, the primers and barcodes have been removed, so it’s a 292bp amplicon from 251bp PE V4 reads after make.contigs.

Perhaps it’s as you say, that make.contigs has staggered to optimize overlap?

pschloss · August 17, 2016, 3:45pm

Not sure - I suspect the primers aren’t really removed since the V4 region, within the primers, is pretty consistently right around 250 nt long.

Egansbay · August 17, 2016, 5:28pm

Thanks Pat, so you don’t think that make.contigs has staggered in order to optimize the overlap in the V4 region and hance the amplicon appears as 292bp?

pschloss · August 29, 2016, 12:17pm

No, I’ve never seen a case where the majority of V4 contigs are anything but 250-255 nt. I suspect something else is going on here. Can you post the forward and reverse read for one of the sequences that assembles into a 290 nt contig?

Pat

Egansbay · August 30, 2016, 9:35am

Hi Pat

Here’s the summary.seqs output after make.contigs.

The primers used were 515F and 808R (~290bp), however these reads were trimmed to 251bp, and I verified their length at 251bp with FASTQC.

mothur >
summary.seqs(fasta=rtgeseqs.trim.contigs.fasta)

Using 1 processors.

Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 247 247 0 3 1
2.5%-tile: 1 290 290 0 4 573520
25%-tile: 1 291 291 0 4 5735197
Median: 1 291 291 0 4 11470394
75%-tile: 1 292 292 1 4 17205590
97.5%-tile: 1 292 292 13 6 22367267
Maximum: 1 502 502 235 251 22940786
Mean: 1 291.338 291.338 1.32831 4.2773

of Seqs: 22940786

Output File Names:
rtgeseqs.trim.contigs.summary

It took 770 secs to summarize 22940786 sequences.

Thanks again for your help

pschloss · August 30, 2016, 12:07pm

I suspect those still have the primers attached. if you can post the forward and reverse sequence for one of the 290 nt contigs, I can have a look.

Pat

Egansbay · August 30, 2016, 12:22pm

Hi Pat

I think the primers, but not the adapters/linkers etc are attached here. However the seq length is 251bp.

Forward
GTGCCAGCCGCCGCGGTAATACATAGGATGCAAGCGTTATCCGGGTTTACTGGGCGTAAAGCGAGCGCAGGCGGATTTACAAGTCTGATGTTAAAGACAACTGCTTAACGGTTGTTTGCATTGGAAACTGTAAGTCTAGGGTATAGTAGAGAGTTTTGGAAATCCATGTGGAGCGGTGGAATGCGTAGATATATGGAAACACACCAGAGGCGAAGGCGACAACTTAGGCTATAACTGACGCTTACGCTCGA

Reverse
GGACTACACGGGTATCTAATCCTATTTGCTCCCCACACTCTCGAGCCTAAGCGTCAGTTATAGCCTAAGTTTTCGCCTTCGCCTCTGGTGTTCTTCCATATATCTACGCATTCCACCGCTCCACATGGAGTTCCACAACTCTCTACTATACCCTACACTTACAGTTTCCAATGCAAACAACAGTTTACACGTTCTCCTTAACAGCAGACTTCTAACTCCGCCTGCTCTCGCCTCACGCCCAGTACATCCCG

Much appreciate your help here.

Kendra · August 30, 2016, 7:56pm

Pat’s right. you still have primers

forward
GTGCCAGCCGCCGCGGTAA

reverse
GGACTACACGGGTATCTAAT

Egansbay · August 31, 2016, 9:04am

Thanks Pat and kmitchell

So, in this case is mothur reading the merged amplicon as 292bp as the first ~20bp aren’t overlapping, and so is staggering to focus on the sequence conservation in the overlapping area (i.e. 251bp)??

pschloss · September 6, 2016, 1:09pm

Sorry, I don’t know what you mean by “staggered”.

Your first read starts with GTGCCAGCCGCCGCGGTAA (19 nt) and your second read starts with the reverse compliment of your reverse primer - GGACTACACGGGTATCTAAT (20 nt). make.contigs will merge the two reads to assemble the contig and if you give it an oligos file with your forward and reverse primers it will remove them to give you a product that is ~253 nt long. We don’t analyze the primer region since that sequence comes from the primer, not the bacterial DNA.

Pat

Egansbay · September 6, 2016, 3:13pm

Thanks Pat

By ‘staggered’ I am referring to your previous comment on this thread (below).

The amplicon is likely ~292 nt long. make.contigs makes the contigs, so if you have 2 250 nt fragments it will stagger them to optimize the sequence conservation in the overlapping area.

Pat

If these reads are merged with the short primer reads still attached, will this cause classification issues?

Thank you

Topic		Replies	Views
Miseq, long reads vs short reads Theory behind mothur	2	4905	August 13, 2014
seqences not in the same lenth mothur bugs	4	5081	October 28, 2014
Sequence Length Theory behind mothur	11	4409	May 30, 2019
How to decide maxlength? Theory behind mothur	4	4314	July 20, 2015
Size of sequences	4	478	January 19, 2021

Sequence length

of Seqs: 22940786

Related topics