Sanger 16S rRNA gene analysis

Hi. i’m done with the analysis on my sequence data from Sanger sequencing.
I’m new in sort of these analysis so i want to confirm if i did it each step correctly.
I referred to the example “Esophageal community analysis”.

I aligned the fasta files(almost full length) with silva.seed_v132.align as you’ve recommended. My purpose on the analysis was that after clustering into OTUs, getting the representative OTUs for further taxonomy assignment.
The followings is the commands that i used.

OTU clustering for cultured samples

step

  1. setting the directory
    set.dir(input=/Users/uihy/Downloads/silva.seed_v132)

    Mothur's directories:
    inputDir=/Users/uihy/Downloads/silva.seed_v132/
    
  2. alignment with silva alignment database
    align.seqs(fasta=stock_re.fasta, reference=silva.seed_v132.align, processors=2)

    Using 2 processors.
    
    Reading in the /Users/uihy/Downloads/silva.seed_v132/silva.seed_v132.align template sequences...	DONE.
    It took 12 to read  11180 sequences.
    
    Aligning sequences from /Users/uihy/Downloads/silva.seed_v132/stock_541.fasta ...
    It took 6 secs to align 541 sequences.
    
    
    It took 6 seconds to align 541 sequences.
    
    Output File Names: 
    /Users/uihy/Downloads/silva.seed_v132/stock_541.align
    /Users/uihy/Downloads/silva.seed_v132/stock_541.align.report
    
  3. summary the alignment file
    summary.seqs(stock_541.align)

    Using 2 processors.
    
    	Start	End	NBases	Ambigs	Polymer	NumSeqs
    Minimum:	1046	41517	1221	0	5	1
    2.5%-tile:	1148	41572	1249	0	5	14
    25%-tile:	1726	42565	1287	0	5	136
    Median: 	1769	42581	1309	0	6	271
    75%-tile:	2032	42622	1335	0	6	406
    97.5%-tile:	2086	42984	1372	0	6	528
    Maximum:	2540	43017	1402	0	9	541
    Mean:	1726	42563	1309	0	5
    # of Seqs:	541
    
    It took 0 secs to summarize 541 sequences.
    
    Output File Names:
    /Users/uihy/Downloads/silva.seed_v132/stock_541.summary
    
  4. get the same region by modifying the start point and end point
    screen.seqs(fasta=stock_541.align, group=stock541.groups,start=2086, end=41517, maxambig=5)

    Using 2 processors.
    
    It took 0 secs to screen 541 sequences, removed 13.
    
    /******************************************/
    Running command: remove.seqs(accnos=/Users/uihy/Downloads/silva.seed_v132/stock_541.bad.accnos.temp, group=/Users/uihyeontae/Downloads/silva.seed_v132/stock541.groups)
    Removed 13 sequences from your group file.
    
    Output File Names:
    /Users/uihy/Downloads/silva.seed_v132/stock541.pick.groups
    
    /******************************************/
    
    Output File Names:
    /Users/uihy/Downloads/silva.seed_v132/stock_541.good.align
    /Users/uihy/Downloads/silva.seed_v132/stock_541.bad.accnos
    /Users/uihy/Downloads/silva.seed_v132/stock541.good.groups
    
    
    It took 0 secs to screen 541 sequences.
    
  5. delete the gaps
    filter.seqs(fasta=stock_541.good.align, trump=., vertical=T)

    Using 2 processors.
    Creating Filter...
    It took 0 secs to create filter for 528 sequences.
    
    
    Running Filter...
    It took 0 secs to filter 528 sequences.
    
    Length of filtered alignment: 1387
    Number of columns removed: 48613
    Length of the original alignment: 50000
    Number of sequences used to construct filter: 528
    
    Output File Names: 
    /Users/uihy/Downloads/silva.seed_v132/stock_541.filter
    /Users/uihy/Downloads/silva.seed_v132/stock_541.good.filter.fasta
    
  6. calculate the distance
    dist.seqs(fasta=stock_541.good.filter.fasta, output=lt)

    Using 2 processors.
    
    Sequence	Time	Num_Dists_Below_Cutoff
    
    It took 1 secs to find distances for 528 sequences. 139128 distances below cutoff 1.
    
    
    Output File Names: 
    /Users/uihy/Downloads/silva.seed_v132/stock_541.good.filter.phylip.dist
    
  7. cluster the OTU (cutoff =98.7%)
    cluster(phylip=stock_541.good.filter.phylip.dist, cutoff=0.013)

    Using 2 processors.
    
    Clustering /Users/uihy/Downloads/silva.seed_v132/stock_541.good.filter.phylip.dist
    
    iter	time	label	num_otus	cutoff	tp	tn	fp	fn	sensitivity	specificity	ppv	npv	fdr	accuracy	mcc	f1score
    
    0.013
    0	0	0.013	528	0.013	0	128888	0	10240	0	1	0	0.926399	1	0.926399	0	0	
    1	0	0.013	88	0.013	10229	128874	14	11	0.998926	0.999891	0.998633	0.999915	0.998633	0.99982	0.998683	0.998779	
    2	0	0.013	87	0.013	10236	128871	17	4	0.999609	0.999868	0.998342	0.999969	0.998342	0.999849	0.998894	0.998975	
    3	0	0.013	87	0.013	10236	128871	17	4	0.999609	0.999868	0.998342	0.999969	0.998342	0.999849	0.998894	0.998975	
    
    
    It took 0 seconds to cluster
    
    Output File Names: 
    /Users/uihy/Downloads/silva.seed_v132/stock_541.good.filter.phylip.opti_mcc.list
    /Users/uihy/Downloads/silva.seed_v132/stock_541.good.filter.phylip.opti_mcc.steps
    /Users/uihy/Downloads/silva.seed_v132/stock_541.good.filter.phylip.opti_mcc.sensspec
    
  8. get the representative fasta files and .names files
    get.oturep(phylip=stock_541.good.filter.phylip.dist, list=stock_541.good.filter.phylip.opti_mcc.list, fasta=stock_541.fasta)

    You did not provide a label, using 0.013.
    0.013	87
    
    Output File Names: 
    /Users/uihy/Downloads/silva.seed_v132/stock_541.good.filter.phylip.opti_mcc.0.013.rep.names
    /Users/uihy/Downloads/silva.seed_v132/stock_541.good.filter.phylip.opti_mcc.0.013.rep.fasta
    
  9. get the representative OTU list with OTU number
    get.otulist(list=stock_541.good.filter.phylip.opti_mcc.list, label=0.013)

    Output File Names: 
    /Users/uihy/Downloads/silva.seed_v132/stock_541.good.filter.phylip.opti_mcc.0.013.otu
    
Summary

This text will be hidden

Looks good to me. My only comment would be that maxambig should really never be anything other than zero.

Good work!
Pat

1 Like

Thank you for your reply as I’ve really needed one who can give feedback on my analysis process.

Mothur makes me to have great interest in getting started to analyze multiple sequences!

Thank you again for your feedback!

1 Like

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.