mothur

Sequence length range for V4 region analysis

Hi community!!!

  • I’m analysing V4 region sequence. As we know this region is around 253 bp long, what range should I use in the first “screen.seqs” in order to remove bad sequences? I’m using “(minlength=230, maxlength=275)”. Is it ok?

  • why, in first “screen.seqs” step of mothur SOP, maxlength=275 has been considered?

Thanks & Regards,
DC7

Hi,

There aren’t any good sequences in the database for the V4 region (with priming sites removed) that are longer than 275 nt.

Pat

Thanks sir for your reply. But what minimum length should we consider in any analysis?

with paired 250 nt reads, i’m not sure it’s possible to get much below 250 nt. probably 240 on the low end.

Sir, firstly thanks :slightly_smiling_face: and secondly pardon :frowning_face: for repetetive questions. Sir, what if I take a broader range of length?
When I analysed V4-V6 region (550bp) I considered minlength=525, maxlength=575 for the “screen.seqs” command. Is it considered wrong? What would be your comment as a reviewer?

mothur > summary.seqs(fasta=current)
Using /media/dc7/New Volume/OBESITY/prjna321731_16s/prjna321731_16s_nw/merge.paired.trim.contigs.fasta as input file for the fasta parameter.

Using 8 processors.

	          Start	End	NBases	Ambigs	Polymer	NumSeqs
    Minimum:	1	301	301	       0	3	1
    2.5%-tile:	1	542	542	       0	4	15495
    25%-tile:	1	544	544	       0    5	154948
    Median: 	1	547	547	       0	5	309896
    75%-tile:	1	550	550	       2	5	464843
    97.5%-tile:	1	553	553        7	7	604296
    Maximum:	1	602	602	      62	299	619790
    Mean:	    1	546	546	       1	5
    # of Seqs:	619790

It took 22 secs to summarize 619790 sequences.

Output File Names:
/media/dc7/New Volume/OBESITY/prjna321731_16s/prjna321731_16s_nw/merge.paired.trim.contigs.summary

mothur > screen.seqs(fasta=current, group=current, maxambig=0, maxhomop=8, minlength=525, maxlength=575)
Using /media/dc7/New Volume/OBESITY/prjna321731_16s/prjna321731_16s_nw/merge.paired.trim.contigs.fasta as input file for the fasta parameter.
Using /media/dc7/New Volume/OBESITY/prjna321731_16s/prjna321731_16s_nw/merge.paired.contigs.groups as input file for the group parameter.

Using 8 processors.

It took 7 secs to screen 619790 sequences, removed 286319.

Thanks and Regards,
DC7
`

You would need to generate a reference alignment for V4-V6 (without the primers on the sequences) and then run it through summary.seqs. The output would show you what ranges you would expect (keep in mind that there might be some weird outliers).

Pat