I did not use any unique.seqs. My advisor suggested I skip these steps so we could look closer at abundance. Here were all the commands I did:
mothur > make.file(inputdir = /Volumes/FlashDrive/USA/MicrobiomeProject/Bioinformatics, type=fastq, prefix = stability)
mothur > make.contigs(file = /Volumes/FlashDrive/USA/MicrobiomeProject/Bioinformatics/stability.files )
mothur > summary.seqs(fasta = /Volumes/FlashDrive/USA/MicrobiomeProject/Bioinformatics/stability.trim.contigs.fasta , count = /Volumes/FlashDrive/USA/MicrobiomeProject/Bioinformatics/stability.contigs.count_tabe )
Using 10 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 35 35 0 2 1
2.5%-tile: 1 35 35 0 4 325544
25%-tile: 1 440 440 0 5 3255437
Median: 1 445 445 18 6 6510874
75%-tile: 1 465 465 21 6 9766311
97.5%-tile: 1 466 466 38 35 12696204
Maximum: 1 602 602 183 301 13021747
Mean: 1 424 424 13 7
of unique seqs: 13021747
total # of seqs: 13021747
mothur > screen.seqs(fasta = /Volumes/FlashDrive/USA/MicrobiomeProject/Bioinformatics/stability.trim.contigs.fasta , count = /Volumes/FlashDrive/USA/MicrobiomeProject/Bioinformatics/stability.contigs.count_tabl , maxambig = 0, minlength = 200, maxlength = 466, maxhomop = 8)
mothur > summary.seqs(fasta = current, count = current)
Using /Volumes/FlashDrive/USA/MicrobiomeProject/Bioinformatics/stability.contigs.good.count_table as input file for the count parameter.
Using /Volumes/FlashDrive/USA/MicrobiomeProject/Bioinformatics/stability.trim.contigs.good.fasta as input file for the fasta parameter.
Using 10 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 200 200 0 3 1
2.5%-tile: 1 288 288 0 4 103523
25%-tile: 1 441 441 0 5 1035221
Median: 1 442 442 0 5 2070442
75%-tile: 1 465 465 0 6 3105663
97.5%-tile: 1 465 465 0 6 4037361
Maximum: 1 466 466 0 8 4140883
Mean: 1 436 436 0 5
of unique seqs: 4140883
total # of seqs: 4140883
mothur > align.seqs(fasta = /Volumes/FlashDrive/USA/MicrobiomeProject/Bioinformatics/stability.trim.contigs.good.fasta , reference = /Users/joehansen/Documents/USA/MicrobiomeProject/Bioinformatics/silva.bacteri/silva.bacteria.fasta )
mothur > summary.seqs(fasta = /Volumes/FlashDrive/USA/MicrobiomeProject/Bioinformatics/stability.trim.contigs.good.align , count = /Volumes/FlashDrive/USA/MicrobiomeProject/Bioinformatics/stability.contigs.goodcount_table )
Using 10 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 0 0 0 0 1 1
2.5%-tile: 6388 25316 10 0 3 103523
25%-tile: 6388 25316 441 0 4 1035221
Median: 6388 25316 442 0 5 2070442
75%-tile: 6388 25316 465 0 6 3105663
97.5%-tile: 43061 43116 465 0 6 4037361
Maximum: 43116 43116 466 0 8 4140883
Mean: 9434 26631 411 0 5
of unique seqs: 4140883
total # of seqs: 4140883
It took 4324 secs to summarize 4140883 sequences.
mothur > screen.seqs(fasta = /Volumes/FlashDrive/USA/MicrobiomeProject/Bioinformatics/stability.trim.contigs.good.align , count = /Volumes/FlashDrive/USA/MicrobiomeProject/Bioinformatics/stability.contigs.good.ount_table , start = 6388, end = 25316)
mothur > summary.seqs(fasta = current, count = current)
Using /Volumes/FlashDrive/USA/MicrobiomeProject/Bioinformatics/stability.contigs.good.good.count_table as input file for the count parameter.
Using /Volumes/FlashDrive/USA/MicrobiomeProject/Bioinformatics/stability.trim.contigs.good.good.align as input file for the fasta parameter.
Using 10 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 6099 25316 421 0 3 1
2.5%-tile: 6388 25316 440 0 4 92596
25%-tile: 6388 25316 442 0 4 925954
Median: 6388 25316 445 0 5 1851907
75%-tile: 6388 25316 465 0 6 2777860
97.5%-tile: 6388 25316 465 0 6 3611218
Maximum: 6388 26155 466 0 8 3703813
Mean: 6387 25316 451 0 5
of unique seqs: 3703813
total # of seqs: 3703813
mothur > filter.seqs(fasta = /Volumes/FlashDrive/USA/MicrobiomeProject/Bioinformatics/stability.trim.contigs.good.good.align , vertical = T, trump=.)
It took 17646 secs to filter 3703813 sequences.
Length of filtered alignment: 1194
Number of columns removed: 48806
Length of the original alignment: 50000
Number of sequences used to construct filter: 3703813
mothur > summary.seqs(fasta = /Volumes/FlashDrive/USA/MicrobiomeProject/Bioinformatics/ASV/stability.trim.contigs.good.good.filter.fasta , count = /Volumes/FlashDrive/USA/MicrobiomeProject/Bioinformatics/ASV/stbility.contigs.good.good.count_table )
Using 10 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 1192 421 0 3 1
2.5%-tile: 1 1194 440 0 4 92596
25%-tile: 1 1194 442 0 4 925954
Median: 1 1194 445 0 5 1851907
75%-tile: 1 1194 465 0 6 2777860
97.5%-tile: 1 1194 465 0 6 3611218
Maximum: 2 1194 466 0 8 3703813
Mean: 1 1193 451 0 5
of unique seqs: 3703813
total # of seqs: 3703813
It took 109 secs to summarize 3703813 sequences.
mothur > pre.cluster(fasta = /Volumes/FlashDrive/USA/MicrobiomeProject/Bioinformatics/ASV/stability.trim.contigs.good.good.filter.fasta , count = /Volumes/FlashDrive/USA/MicrobiomeProject/Bioinformatics/ASV/staility.contigs.good.good.count_table , diffs = 4)
mothur > chimera.vsearch(fasta = /Volumes/FlashDrive/USA/MicrobiomeProject/Bioinformatics/ASV/stability.trim.contigs.good.good.filter.precluster.fasta , count = /Volumes/FlashDrive/USA/MicrobiomeProject/Bioinfomatics/ASV/stability.trim.contigs.good.good.filter.precluster.count_table , dereplicate = T)
mothur > summary.seqs(fasta = /Volumes/FlashDrive/USA/MicrobiomeProject/Bioinformatics/ASV/stability.trim.contigs.good.good.filter.precluster.fasta , count = /Volumes/FlashDrive/USA/MicrobiomeProject/Bioinformaics/ASV/stability.trim.contigs.good.good.filter.precluster.denovo.vsearch.count_table )
Using 10 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 1192 421 0 3 1
2.5%-tile: 1 1194 440 0 4 92596
25%-tile: 1 1194 442 0 4 925954
Median: 1 1194 445 0 5 1851907
75%-tile: 1 1194 465 0 6 2777860
97.5%-tile: 1 1194 465 0 6 3611218
Maximum: 2 1194 466 0 8 3703813
Mean: 1 1193 451 0 5
of unique seqs: 3703813
total # of seqs: 3703813
It took 109 secs to summarize 3703813 sequences.
mothur > classify.seqs(fasta = /Volumes/FlashDrive/USA/MicrobiomeProject/Bioinformatics/ASV/stability.trim.contigs.good.good.filter.precluster.fasta , count = /Volumes/FlashDrive/USA/MicrobiomeProject/Bioinformatics/ASV/stability.trim.contigs.good.good.filter.precluster.denovo.vsearch.count_table , reference = /Users/joehansen/Documents/USA/MicrobiomeProject/Bioinformatics/trainset18_062020.rdp/trainset18_062020.rdp.fasta , taxonomy = /Users/joehansen/Documents/USA/MicrobiomeProject/Bioinformatics/trainset18_062020.rdp/trainset18_062020.rdp.tax )
mothur > remove.lineage(fasta = /Volumes/FlashDrive/USA/MicrobiomeProject/Bioinformatics/ASV/stability.trim.contigs.good.good.filter.precluster.fasta , count = /Volumes/FlashDrive/USA/MicrobiomeProject/Bioinforatics/ASV/stability.trim.contigs.good.good.filter.precluster.denovo.vsearch.count_table, taxonomy = /Volumes/FlashDrive/USA/MicrobiomeProject/Bioinformatics/ASV/stability.trim.contigs.good.good.filter.precluster.rdp.wang.taxonomy , taxon = Chloroplast-Mitochondria-unknown-Archaea-Eukaryota)
mothur > summary.seqs(fasta = /Volumes/FlashDrive/USA/MicrobiomeProject/Bioinformatics/ASV/stability.trim.contigs.good.good.filter.precluster.pick.fasta , count = /Volumes/FlashDrive/USA/MicrobiomeProject/Bioinormatics/ASV/stability.trim.contigs.good.good.filter.precluster.denovo.vsearch.pick.count_table )
Using 10 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 1192 421 0 3 1
2.5%-tile: 1 1194 440 0 4 89648
25%-tile: 1 1194 442 0 4 896473
Median: 1 1194 445 0 5 1792946
75%-tile: 1 1194 465 0 6 2689418
97.5%-tile: 1 1194 465 0 6 3496243
Maximum: 2 1194 466 0 8 3585890
Mean: 1 1193 452 0 5
of unique seqs: 3585890
total # of seqs: 3585890
mothur > summary.tax(taxonomy = /Volumes/FlashDrive/USA/MicrobiomeProject/Bioinformatics/ASV/stability.trim.contigs.good.good.filter.precluster.rdp.wang.pick.taxonomy , count = /Volumes/FlashDrive/USA/MicrobiomProject/Bioinformatics/ASV/stability.trim.contigs.good.good.filter.precluster.denovo.vsearch.pick.count_table )
It took 103 secs to create the summary file for 3585890 sequences.
mothur > make.shared(count = /Volumes/FlashDrive/USA/MicrobiomeProject/Bioinformatics/ASV/stability.trim.contigs.good.good.filter.precluster.denovo.vsearch.pick.count_table )
mothur > classify.otu(list = /Volumes/FlashDrive/USA/MicrobiomeProject/Bioinfomatics/ASV/stability.trim.contigs.good.good.filter.precluster.denovo.vsearch.pick.asv.list , count = /Volumes/FlashDrive/USA/MicrobiomeProject/Bioinformatics/ASV/stability.trim.contigs.good.good.filter.precluster.denovo.vsearch.pick.count_table , taxonomy = /Volumes/FlashDrive/USA/MicrobiomeProject/Bioinformatics/ASV/stability.trim.contigs.good.good.filter.precluster.rdp.wang.pick.taxonomy , label = ASV)