Hi. i’m done with the analysis on my sequence data from Sanger sequencing.
I’m new in sort of these analysis so i want to confirm if i did it each step correctly.
I referred to the example “Esophageal community analysis”.
I aligned the fasta files(almost full length) with silva.seed_v132.align as you’ve recommended. My purpose on the analysis was that after clustering into OTUs, getting the representative OTUs for further taxonomy assignment.
The followings is the commands that i used.
OTU clustering for cultured samples
step
-
setting the directory
set.dir(input=/Users/uihy/Downloads/silva.seed_v132)
Mothur's directories: inputDir=/Users/uihy/Downloads/silva.seed_v132/
-
alignment with silva alignment database
align.seqs(fasta=stock_re.fasta, reference=silva.seed_v132.align, processors=2)
Using 2 processors. Reading in the /Users/uihy/Downloads/silva.seed_v132/silva.seed_v132.align template sequences... DONE. It took 12 to read 11180 sequences. Aligning sequences from /Users/uihy/Downloads/silva.seed_v132/stock_541.fasta ... It took 6 secs to align 541 sequences. It took 6 seconds to align 541 sequences. Output File Names: /Users/uihy/Downloads/silva.seed_v132/stock_541.align /Users/uihy/Downloads/silva.seed_v132/stock_541.align.report
-
summary the alignment file
summary.seqs(stock_541.align)
Using 2 processors. Start End NBases Ambigs Polymer NumSeqs Minimum: 1046 41517 1221 0 5 1 2.5%-tile: 1148 41572 1249 0 5 14 25%-tile: 1726 42565 1287 0 5 136 Median: 1769 42581 1309 0 6 271 75%-tile: 2032 42622 1335 0 6 406 97.5%-tile: 2086 42984 1372 0 6 528 Maximum: 2540 43017 1402 0 9 541 Mean: 1726 42563 1309 0 5 # of Seqs: 541 It took 0 secs to summarize 541 sequences. Output File Names: /Users/uihy/Downloads/silva.seed_v132/stock_541.summary
-
get the same region by modifying the start point and end point
screen.seqs(fasta=stock_541.align, group=stock541.groups,start=2086, end=41517, maxambig=5)
Using 2 processors. It took 0 secs to screen 541 sequences, removed 13. /******************************************/ Running command: remove.seqs(accnos=/Users/uihy/Downloads/silva.seed_v132/stock_541.bad.accnos.temp, group=/Users/uihyeontae/Downloads/silva.seed_v132/stock541.groups) Removed 13 sequences from your group file. Output File Names: /Users/uihy/Downloads/silva.seed_v132/stock541.pick.groups /******************************************/ Output File Names: /Users/uihy/Downloads/silva.seed_v132/stock_541.good.align /Users/uihy/Downloads/silva.seed_v132/stock_541.bad.accnos /Users/uihy/Downloads/silva.seed_v132/stock541.good.groups It took 0 secs to screen 541 sequences.
-
delete the gaps
filter.seqs(fasta=stock_541.good.align, trump=., vertical=T)
Using 2 processors. Creating Filter... It took 0 secs to create filter for 528 sequences. Running Filter... It took 0 secs to filter 528 sequences. Length of filtered alignment: 1387 Number of columns removed: 48613 Length of the original alignment: 50000 Number of sequences used to construct filter: 528 Output File Names: /Users/uihy/Downloads/silva.seed_v132/stock_541.filter /Users/uihy/Downloads/silva.seed_v132/stock_541.good.filter.fasta
-
calculate the distance
dist.seqs(fasta=stock_541.good.filter.fasta, output=lt)
Using 2 processors. Sequence Time Num_Dists_Below_Cutoff It took 1 secs to find distances for 528 sequences. 139128 distances below cutoff 1. Output File Names: /Users/uihy/Downloads/silva.seed_v132/stock_541.good.filter.phylip.dist
-
cluster the OTU (cutoff =98.7%)
cluster(phylip=stock_541.good.filter.phylip.dist, cutoff=0.013)
Using 2 processors. Clustering /Users/uihy/Downloads/silva.seed_v132/stock_541.good.filter.phylip.dist iter time label num_otus cutoff tp tn fp fn sensitivity specificity ppv npv fdr accuracy mcc f1score 0.013 0 0 0.013 528 0.013 0 128888 0 10240 0 1 0 0.926399 1 0.926399 0 0 1 0 0.013 88 0.013 10229 128874 14 11 0.998926 0.999891 0.998633 0.999915 0.998633 0.99982 0.998683 0.998779 2 0 0.013 87 0.013 10236 128871 17 4 0.999609 0.999868 0.998342 0.999969 0.998342 0.999849 0.998894 0.998975 3 0 0.013 87 0.013 10236 128871 17 4 0.999609 0.999868 0.998342 0.999969 0.998342 0.999849 0.998894 0.998975 It took 0 seconds to cluster Output File Names: /Users/uihy/Downloads/silva.seed_v132/stock_541.good.filter.phylip.opti_mcc.list /Users/uihy/Downloads/silva.seed_v132/stock_541.good.filter.phylip.opti_mcc.steps /Users/uihy/Downloads/silva.seed_v132/stock_541.good.filter.phylip.opti_mcc.sensspec
-
get the representative fasta files and .names files
get.oturep(phylip=stock_541.good.filter.phylip.dist, list=stock_541.good.filter.phylip.opti_mcc.list, fasta=stock_541.fasta)
You did not provide a label, using 0.013. 0.013 87 Output File Names: /Users/uihy/Downloads/silva.seed_v132/stock_541.good.filter.phylip.opti_mcc.0.013.rep.names /Users/uihy/Downloads/silva.seed_v132/stock_541.good.filter.phylip.opti_mcc.0.013.rep.fasta
-
get the representative OTU list with OTU number
get.otulist(list=stock_541.good.filter.phylip.opti_mcc.list, label=0.013)
Output File Names: /Users/uihy/Downloads/silva.seed_v132/stock_541.good.filter.phylip.opti_mcc.0.013.otu
Summary
This text will be hidden