Sequence length

So, my seqs have been trimmed to 251bp length before proceeding to Mothur analysis. FastQC confirms the length at 251bp.

However after make.contigs and then summary.seqs, I see that the majority of seqs are listed in the table as being 292bp length.

Might be interpreting something incorrectly, but could somebody please help me on this one?

Thank you

They aren’t fully overlapped. The first and last ~25bp of your sequences aren’t overlapping.

Thanks kmitchell

So even though all seqs have been trimmed to 251bp, if they do not overlap they’ll show up in the summary table at 292bp?

The amplicon is likely ~292 nt long. make.contigs makes the contigs, so if you have 2 250 nt fragments it will stagger them to optimize the sequence conservation in the overlapping area.

Pat

Thanks Pat

So even though these are 2 x 251bp v4 reads, the amplicon is staggered and showing up as 292bp?

Correct, the only relationship that should be true for sequencing on miseq/hiseq is that your amplicon or fragment needs to be longer than the sequence length. You could technically sequence full length 16s on a 2x25 kit-you wouldn’t get useful information out of that but the machine would still give you gigs of >q30 sequences. Illumina has decent videos on youtube that may help you understand what’s going on with paired end sequencing.

For diverse amplicons you should be aiming for a lot of overla, read any of the posts about the subpar results people are getting with v1-3 sequencing on miseq to get an understanding of why non-overlapping q30 sequences are not good for amplicons.

Thanks kmitchell

I trimmed my seqs (f and r) to 251bp as the reverse read was showing a drop in quality (when screened with fastqc) after 251bp.

Do you think that my seqs have failed to merged correctly id they’re showing up at 292bp?

Just wondering if anybody can help with the last question as to whether my seqs have merged correctly if they’re showing up as 292bp length for a 251 PE reads?

Thanks

I suspect that your contigs include your barcodes and primers. If you follow our wetlab SOP you don’t get separate files for hte barcodes or even sequene the primers.

Thanks Pat, the primers and barcodes have been removed, so it’s a 292bp amplicon from 251bp PE V4 reads after make.contigs.

Perhaps it’s as you say, that make.contigs has staggered to optimize overlap?

Not sure - I suspect the primers aren’t really removed since the V4 region, within the primers, is pretty consistently right around 250 nt long.

Thanks Pat, so you don’t think that make.contigs has staggered in order to optimize the overlap in the V4 region and hance the amplicon appears as 292bp?

No, I’ve never seen a case where the majority of V4 contigs are anything but 250-255 nt. I suspect something else is going on here. Can you post the forward and reverse read for one of the sequences that assembles into a 290 nt contig?

Pat

Hi Pat

Here’s the summary.seqs output after make.contigs.

The primers used were 515F and 808R (~290bp), however these reads were trimmed to 251bp, and I verified their length at 251bp with FASTQC.

mothur >
summary.seqs(fasta=rtgeseqs.trim.contigs.fasta)

Using 1 processors.

Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 247 247 0 3 1
2.5%-tile: 1 290 290 0 4 573520
25%-tile: 1 291 291 0 4 5735197
Median: 1 291 291 0 4 11470394
75%-tile: 1 292 292 1 4 17205590
97.5%-tile: 1 292 292 13 6 22367267
Maximum: 1 502 502 235 251 22940786
Mean: 1 291.338 291.338 1.32831 4.2773

of Seqs: 22940786

Output File Names:
rtgeseqs.trim.contigs.summary

It took 770 secs to summarize 22940786 sequences.

Thanks again for your help

I suspect those still have the primers attached. if you can post the forward and reverse sequence for one of the 290 nt contigs, I can have a look.

Pat

Hi Pat

I think the primers, but not the adapters/linkers etc are attached here. However the seq length is 251bp.

Forward
GTGCCAGCCGCCGCGGTAATACATAGGATGCAAGCGTTATCCGGGTTTACTGGGCGTAAAGCGAGCGCAGGCGGATTTACAAGTCTGATGTTAAAGACAACTGCTTAACGGTTGTTTGCATTGGAAACTGTAAGTCTAGGGTATAGTAGAGAGTTTTGGAAATCCATGTGGAGCGGTGGAATGCGTAGATATATGGAAACACACCAGAGGCGAAGGCGACAACTTAGGCTATAACTGACGCTTACGCTCGA

Reverse
GGACTACACGGGTATCTAATCCTATTTGCTCCCCACACTCTCGAGCCTAAGCGTCAGTTATAGCCTAAGTTTTCGCCTTCGCCTCTGGTGTTCTTCCATATATCTACGCATTCCACCGCTCCACATGGAGTTCCACAACTCTCTACTATACCCTACACTTACAGTTTCCAATGCAAACAACAGTTTACACGTTCTCCTTAACAGCAGACTTCTAACTCCGCCTGCTCTCGCCTCACGCCCAGTACATCCCG

Much appreciate your help here.

Pat’s right. you still have primers

forward
GTGCCAGCCGCCGCGGTAA

reverse
GGACTACACGGGTATCTAAT

Thanks Pat and kmitchell

So, in this case is mothur reading the merged amplicon as 292bp as the first ~20bp aren’t overlapping, and so is staggering to focus on the sequence conservation in the overlapping area (i.e. 251bp)??

Sorry, I don’t know what you mean by “staggered”.

Your first read starts with GTGCCAGCCGCCGCGGTAA (19 nt) and your second read starts with the reverse compliment of your reverse primer - GGACTACACGGGTATCTAAT (20 nt). make.contigs will merge the two reads to assemble the contig and if you give it an oligos file with your forward and reverse primers it will remove them to give you a product that is ~253 nt long. We don’t analyze the primer region since that sequence comes from the primer, not the bacterial DNA.

Pat

Thanks Pat

By ‘staggered’ I am referring to your previous comment on this thread (below).

The amplicon is likely ~292 nt long. make.contigs makes the contigs, so if you have 2 250 nt fragments it will stagger them to optimize the sequence conservation in the overlapping area.

Pat

If these reads are merged with the short primer reads still attached, will this cause classification issues?

Thank you