Hi Pat,
A bit long but here it is:
####***********
- Combine reads:
mothur > make.contigs(file=jpk.stability.files,processors=8)
97 522
98 57349
99 12908
Total of all groups is 4961319
Output File Names:
jpk.stability.trim.contigs.fasta
jpk.stability.trim.contigs.qual
jpk.stability.contigs.report
jpk.stability.scrap.contigs.fasta
jpk.stability.scrap.contigs.qual
jpk.stability.contigs.groups
[WARNING]: your sequence names contained ‘:’. I changed them to ‘_’ to avoid problems in your downstream analysis.
####***********
2. Look at summary stats
mothur > summary.seqs(fasta=jpk.stability.trim.contigs.fasta)
.
.
Using 8 processors.
summary.seqs(fasta=jpk.stability.trim.contigs.fasta)
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 244 244 0 2 1
2.5%-tile: 1 252 252 0 4 124033
25%-tile: 1 253 253 0 4 1240330
Median: 1 253 253 0 4 2480660
75%-tile: 1 253 253 0 5 3720990
97.5%-tile: 1 253 253 9 6 4837287
Maximum: 1 502 502 97 251 4961319
Mean: 1 253.055 253.055 0.900051 4.48268
of Seqs: 4961319
Output File Names:
jpk.stability.trim.contigs.summary
It took 55 secs to summarize 4961319 sequences.
####***********
3. mothur > screen.seqs(fasta=jpk.stability.trim.contigs.fasta, group=jpk.stability.contigs.groups, maxambig=0, maxlength=275)
.
.
Processing sequence: 620000
Processing sequence: 620100
Processing sequence: 620128
Output File Names:
jpk.stability.trim.contigs.good.fasta
jpk.stability.trim.contigs.bad.accnos
jpk.stability.contigs.good.groups
It took 66 secs to screen 4961319 sequences.
####***********
4.
mothur > unique.seqs(fasta=jpk.stability.trim.contigs.good.fasta)
.
.
3791000 257984
3792000 258022
3792821 258046
Output File Names:
jpk.stability.trim.contigs.good.names
jpk.stability.trim.contigs.good.unique.fasta
####***********
5.
mothur > get.current()
####***********
6.
mothur > count.seqs(name=jpk.stability.trim.contigs.good.names, group=jpk.stability.contigs.good.groups)
Using 8 processors.
It took 19 secs to create a table for 3792821 sequences.
Total number of sequences: 3792821
Output File Names:
jpk.stability.trim.contigs.good.count_table
####***********
- summary.seqs(count=jpk.stability.trim.contigs.good.count_table)
Using jpk.stability.trim.contigs.good.unique.fasta as input file for the fasta parameter.
Using 8 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 250 250 0 3 1
2.5%-tile: 1 252 252 0 4 94821
25%-tile: 1 253 253 0 4 948206
Median: 1 253 253 0 4 1896411
75%-tile: 1 253 253 0 5 2844616
97.5%-tile: 1 253 253 0 6 3698001
Maximum: 1 273 273 0 9 3792821
Mean: 1 252.8 252.8 0 4.48627
# of unique seqs: 258046
total # of seqs: 3792821
Output File Names:
jpk.stability.trim.contigs.good.unique.summary
It took 4 secs to summarize 3792821 sequences.
####***********
8.
mothur > pcr.seqs(fasta=/Users/pca/Documents/microbiome/jpk/silva.bacteria/silva.bacteria.fasta, start=11894, end=25319, keepdots=F, processors=8)
.
.
Processing sequence: 1870
Processing sequence: 1869
Output File Names:
/Users/pca/Documents/microbiome/jpk/silva.bacteria/silva.bacteria.pcr.fasta
It took 27 secs to screen 14956 sequences.
####***********
9.
mothur > system(cp /Users/pca/Documents/microbiome/jpk/silva.bacteria/silva.bacteria.pcr.fasta /Users/pca/Documents/microbiome/jpk/silva.bacteria/silva.v4.fasta)
mothur > summary.seqs(fasta=/Users/pca/Documents/microbiome/jpk/silva.bacteria/silva.v4.fasta)
.
.
Using 8 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 13424 270 0 3 1
2.5%-tile: 1 13425 292 0 4 374
25%-tile: 1 13425 293 0 4 3740
Median: 1 13425 293 0 4 7479
75%-tile: 1 13425 293 0 5 11218
97.5%-tile: 1 13425 294 1 6 14583
Maximum: 3 13425 351 5 9 14956
Mean: 1.00074 13425 292.977 0.0573014 4.57014
of Seqs: 14956
Output File Names:
/Users/pca/Documents/microbiome/jpk/silva.bacteria/silva.v4.summary
It took 1 secs to summarize 14956 sequences.
####***********
10.
mothur > align.seqs(fasta=jpk.stability.trim.contigs.good.unique.fasta, reference=/Users/pca/Documents/microbiome/jpk/silva.bacteria/silva.v4.fasta)
.
.
32256
32264
32256
[WARNING]: Some of your sequences generated alignments that eliminated too many bases, a list is provided in jpk.stability.trim.contigs.good.unique.flip.accnos. If you set the flip parameter to true mothur will try aligning the reverse compliment as well.
It took 192 secs to align 258046 sequences.
Output File Names:
jpk.stability.trim.contigs.good.unique.align
jpk.stability.trim.contigs.good.unique.align.report
jpk.stability.trim.contigs.good.unique.flip.accnos
####***********
11.
summary.seqs(fasta=jpk.stability.trim.contigs.good.unique.align, count=jpk.stability.trim.contigs.good.count_table)
Using 8 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 1 1 0 1 1
2.5%-tile: 1968 11550 252 0 4 94821
25%-tile: 1968 11550 253 0 4 948206
Median: 1968 11550 253 0 4 1896411
75%-tile: 1968 11550 253 0 5 2844616
97.5%-tile: 1968 11550 253 0 6 3698001
Maximum: 13425 13425 273 0 9 3792821
Mean: 1968.01 11550 252.799 0 4.48626
of unique seqs: 258046
total # of seqs: 3792821
Output File Names:
jpk.stability.trim.contigs.good.unique.summary
It took 68 secs to summarize 3792821 sequences.
####***********
12.
screen.seqs(fasta=jpk.stability.trim.contigs.good.unique.align, count=jpk.stability.trim.contigs.good.count_table, summary=jpk.stability.trim.contigs.good.unique.summary, start=1968, end=11550, maxhomop=8)
.
.
Processing sequence: 32256
Processing sequence: 32200
Processing sequence: 32256
Output File Names:
jpk.stability.trim.contigs.good.unique.good.summary
jpk.stability.trim.contigs.good.unique.good.align
jpk.stability.trim.contigs.good.unique.bad.accnos
jpk.stability.trim.contigs.good.good.count_table
It took 183 secs to screen 258046 sequences.
###***
13. mothur > summary.seqs(fasta=current, count=current)
Using jpk.stability.trim.contigs.good.good.count_table as input file for the count parameter.
Using jpk.stability.trim.contigs.good.unique.good.align as input file for the fasta parameter.
Using 8 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1233 11550 250 0 3 1
2.5%-tile: 1968 11550 252 0 4 94782
25%-tile: 1968 11550 253 0 4 947816
Median: 1968 11550 253 0 4 1895632
75%-tile: 1968 11550 253 0 5 2843448
97.5%-tile: 1968 11550 253 0 6 3696482
Maximum: 1968 13424 273 0 8 3791263
Mean: 1968 11550 252.8 0 4.48624
of unique seqs: 257356
total # of seqs: 3791263
Output File Names:
jpk.stability.trim.contigs.good.unique.good.summary
It took 65 secs to summarize 3791263 sequences.
####***********
14. mothur > filter.seqs(fasta=jpk.stability.trim.contigs.good.unique.good.align, vertical=T, trump=.)
.
.
31900
32000
32100
32170
Length of filtered alignment: 501
Number of columns removed: 12924
Length of the original alignment: 13425
Number of sequences used to construct filter: 257356
Output File Names:
jpk.filter
jpk.stability.trim.contigs.good.unique.good.filter.fasta
This means that our initial alignment was 13425 columns wide and that we were able to remove 12924 terminal gap characters using trump=. and vertical gap characters using vertical=T. The final alignment length is 501 columns. Because we’ve perhaps created some redundancy across our sequences by trimming the ends, we can re-run unique.seqs:
####***********
15.
mothur > unique.seqs(fasta=jpk.stability.trim.contigs.good.unique.good.filter.fasta, count=jpk.stability.trim.contigs.good.good.count_table)
.
.
256000 255807
257000 256806
257356 257162
Output File Names:
jpk.stability.trim.contigs.good.unique.good.filter.count_table
jpk.stability.trim.contigs.good.unique.good.filter.unique.fasta
####***********
16.
mothur > pre.cluster(fasta=jpk.stability.trim.contigs.good.unique.good.filter.unique.fasta, count=jpk.stability.trim.contigs.good.unique.good.filter.count_table, diffs=2)
.
.
jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.97.map
jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.98.map
jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.99.map
####***********
17.
mothur > chimera.uchime(fasta=jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.fasta, count=jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.count_table, dereplicate=t)
.
.
It took 12 secs to check 1144 sequences from group 74.
uchime v4.2.40
by Robert C. Edgar
http://drive5.com/uchime
This code is donated to the public domain.
00:00 835.6 100.0% Reading jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.temp5588.temp
00:00 847.9 302 sequences
00:01 1.6Mb 100.0% 85/301 chimeras found (28.1%)
It took 1 secs to check 302 sequences from group 75.
Output File Names:
jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.pick.count_table
jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.chimeras
jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.accnos
####***********
18.
mothur > remove.seqs(fasta=jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.fasta, accnos=jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.accnos)
[WARNING]: This command can take a namefile and you did not provide one. The current namefile is jpk.stability.trim.contigs.good.names which seems to match jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.fasta.
Removed 32751 sequences from your fasta file.
Output File Names:
jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta
####***********
19.
mother > summary.seqs(fasta=current, count=current)
Using jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.pick.count_table as input file for the count parameter.
Using jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta as input file for the fasta parameter.
Using 8 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 501 241 0 3 1
2.5%-tile: 1 501 252 0 4 89009
25%-tile: 1 501 253 0 4 890086
Median: 1 501 253 0 4 1780172
75%-tile: 1 501 253 0 5 2670258
97.5%-tile: 1 501 253 0 6 3471335
Maximum: 1 501 258 0 8 3560343
Mean: 1 501 252.802 0 4.47901
of unique seqs: 38520
total # of seqs: 3560343
Output File Names:
jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.summary
It took 0 secs to summarize 3560343 sequences.
####***********
20.
mothur > classify.seqs(fasta=jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.pick.count_table, reference=/Users/pca/Documents/microbiome/jpk/RDP_ref/trainset14_032015.pds/trainset14_032015.pds.fasta, taxonomy=/Users/pca/Documents/microbiome/jpk/RDP_ref/trainset14_032015.pds/trainset14_032015.pds.tax, cutoff=80)
.
.
Processing sequence: 38400
Processing sequence: 38500
Processing sequence: 38520
It took 306 secs to classify 38520 sequences.
It took 3 secs to create the summary file for 38520 sequences.
Output File Names:
jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.taxonomy
jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.tax.summary
####***********
21.
Now that everything is classified we want to remove our undesirables. We do this with the remove.lineage command:
mothur > remove.lineage(fasta=jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.pick.count_table, taxonomy=jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.taxonomy, taxon=Chloroplast-Mitochondria-unknown-Archaea-Eukaryota)
[NOTE]: The count file should contain only unique names, so mothur assumes your fasta, list and taxonomy files also contain only uniques.
Output File Names:
jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.taxonomy
jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta
jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.pick.pick.count_table
####***********
22.
mothur > summary.seqs(fasta=current, count=current)
Using jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.pick.pick.count_table as input file for the count parameter.
Using jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta as input file for the fasta parameter.
Using 1 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 501 241 0 3 1
2.5%-tile: 1 501 252 0 4 88907
25%-tile: 1 501 253 0 4 889066
Median: 1 501 253 0 4 1778132
75%-tile: 1 501 253 0 5 2667197
97.5%-tile: 1 501 253 0 6 3467356
Maximum: 1 501 258 0 8 3556262
Mean: 1 501 252.8 0 4.47842
of unique seqs: 38423
total # of seqs: 3556262
Output File Names:
jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.summary
It took 1 secs to summarize 3556262 sequences.
####***********
23.
mothur > dist.seqs(fasta=jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta, cutoff=0.20)
.
.
38300 2680
38400 2691
38422 2695
Output File Names:
jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.dist
It took 2695 seconds to calculate the distances for 38423 sequences.
cluster(column=stinki_stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.dist, count=stinki_stability.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table)
######**********
24.
mothur > cluster(column=jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.dist, count=jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.pick.pick.count_table)
.
.
0 0 0 0 1
changed cutoff to 0.075503
It took 17100 seconds to cluster
Output File Names:
jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.list
###************
25.
mothur > make.shared(list=jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.list, count=jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.pick.pick.count_table, label=0.03)
.
.
0.03
Output File Names:
jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.shared
###**** OTUS
26.
We probably also want to know the taxonomy for each of our OTUs. We can get the consensus taxonomy for each OTU using the classify.otu command:
mothur > classify.otu(list=jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.list, count=jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.pick.pick.count_table, taxonomy=jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.taxonomy, label=0.03)
0.03 2216
Output File Names:
jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.0.03.cons.taxonomy
jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.0.03.cons.tax.summary
####***** PHYLOTYPES
27.
mothur > phylotype(taxonomy=jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.taxonomy)
1
2
3
4
5
6
Output File Names:
jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.tx.sabund
jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.tx.rabund
jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.tx.list
######*********
28.
mothur > make.shared(list=jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.tx.list, count=jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.pick.pick.count_table, label=1)
1
Output File Names:
jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.tx.shared
We also want to know who these OTUs are and can run classify.otu on our phylotypes:
mothur > classify.otu(list=jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.tx.list, count=jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.pick.pick.count_table, taxonomy=jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.taxonomy, label=1)
.
.
1 185
Output File Names:
jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.tx.1.cons.taxonomy
jpk.stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.pick.tx.1.cons.tax.summary