Sequence length range for V4 region analysis

DEEPCHANDA7 · May 22, 2020, 8:50pm

Hi community!!!

I’m analysing V4 region sequence. As we know this region is around 253 bp long, what range should I use in the first “screen.seqs” in order to remove bad sequences? I’m using “(minlength=230, maxlength=275)”. Is it ok?
why, in first “screen.seqs” step of mothur SOP, maxlength=275 has been considered?

Thanks & Regards,
DC7

pschloss · May 25, 2020, 5:23pm

Hi,

There aren’t any good sequences in the database for the V4 region (with priming sites removed) that are longer than 275 nt.

Pat

DEEPCHANDA7 · May 25, 2020, 6:54pm

Thanks sir for your reply. But what minimum length should we consider in any analysis?

pschloss · May 25, 2020, 9:18pm

with paired 250 nt reads, i’m not sure it’s possible to get much below 250 nt. probably 240 on the low end.

DEEPCHANDA7 · May 26, 2020, 1:25pm

Sir, firstly thanks and secondly pardon for repetetive questions. Sir, what if I take a broader range of length?
When I analysed V4-V6 region (550bp) I considered minlength=525, maxlength=575 for the “screen.seqs” command. Is it considered wrong? What would be your comment as a reviewer?

mothur > summary.seqs(fasta=current)
Using /media/dc7/New Volume/OBESITY/prjna321731_16s/prjna321731_16s_nw/merge.paired.trim.contigs.fasta as input file for the fasta parameter.

Using 8 processors.

	          Start	End	NBases	Ambigs	Polymer	NumSeqs
    Minimum:	1	301	301	       0	3	1
    2.5%-tile:	1	542	542	       0	4	15495
    25%-tile:	1	544	544	       0    5	154948
    Median: 	1	547	547	       0	5	309896
    75%-tile:	1	550	550	       2	5	464843
    97.5%-tile:	1	553	553        7	7	604296
    Maximum:	1	602	602	      62	299	619790
    Mean:	    1	546	546	       1	5
    # of Seqs:	619790

It took 22 secs to summarize 619790 sequences.

Output File Names:
/media/dc7/New Volume/OBESITY/prjna321731_16s/prjna321731_16s_nw/merge.paired.trim.contigs.summary

mothur > screen.seqs(fasta=current, group=current, maxambig=0, maxhomop=8, minlength=525, maxlength=575)
Using /media/dc7/New Volume/OBESITY/prjna321731_16s/prjna321731_16s_nw/merge.paired.trim.contigs.fasta as input file for the fasta parameter.
Using /media/dc7/New Volume/OBESITY/prjna321731_16s/prjna321731_16s_nw/merge.paired.contigs.groups as input file for the group parameter.

Using 8 processors.

It took 7 secs to screen 619790 sequences, removed 286319.

Thanks and Regards,
DC7
`

pschloss · May 28, 2020, 12:37pm

You would need to generate a reference alignment for V4-V6 (without the primers on the sequences) and then run it through summary.seqs. The output would show you what ranges you would expect (keep in mind that there might be some weird outliers).

Pat

system · June 7, 2020, 12:45pm

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Interpretation of mothur summary.seqs Commands in mothur	6	557	March 31, 2024
All sequences removed with screen.seqs Commands in mothur	6	874	April 8, 2021
How to decide maxlength? Theory behind mothur	4	4314	July 20, 2015
screen.seqs maxlength Commands in mothur	1	1408	September 3, 2015
Choosing length after first command line Commands in mothur	4	3015	September 15, 2014

Sequence length range for V4 region analysis

Related topics