Hi,
I would be most grateful for your help please. I am getting very different start and end (1 - 502 after the remove.seqs; previously 1 - 249 after screen.seqs) from your MiSEQ SOP. I am using the v4 region, and the start and end has been almost the same for the all the previous steps.
These are the instructions and output so far, notably, I did attempt to go through the MiSeq mothur SOP over two separate timings, and wondering if I have missed a step/used the wrong files!
[NOTE]: Setting random seed to 19760620.
Interactive Mode
mothur > set.dir(input=/data/M
M01359_150_000000000-APKK7_1_1101_12712_1956 M01359_150_000000000-APKK7_1_1101_16430_1874
M01359_150_000000000-APKK7_1_1101_14955_1878 M01359_150_000000000-APKK7_1_1101_18618_1866
M01359_150_000000000-APKK7_1_1101_15214_1945 MMP_gutmicrobiome_2020/
mothur > set.dir(input=/data/MMP_gutmicrobiome_2020/)
Mothur's directories:
inputDir=/data/MMP_gutmicrobiome_2020/
Mothur > make.contigs(file=inputMMPHC.txt)
Total of all groups is 12889892
It took 4137 secs to process 12889892 sequences.
Output File Names:
inputfileMMPHC.trim.contigs.fasta
inputfileMMPHC.scrap.contigs.fasta
inputfileMMPHC.contigs.report
inputfileMMPHC.contigs.groups
Using 8 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 244 244 0 3 1
2.5%-tile: 1 252 252 0 3 322248
25%-tile: 1 253 253 0 4 3222474
Median: 1 253 253 0 4 6444947
75%-tile: 1 253 253 0 5 9667420
97.5%-tile: 1 253 253 8 6 12567645
Maximum: 1 502 502 116 251 12889892
Mean: 1 252 252 0 4
# of Seqs: 12889892
It took 101 secs to summarize 12889892 sequences.
Output File Names:
inputfileMMPHC.trim.contigs.summary
mothur > screen.seqs(fasta=inputfileMMPHC.trim.contigs.fasta , group=inputfileMMPHC.contigs.groups , summary=inputfileMMPHC.trim.contigs.summary , maxambig=0, maxlength=275)
Using 8 processors.
It took 136 secs to screen 12889892 sequences, removed 2428829.
/******************************************/
Running command: remove.seqs(accnos=inputfileMMPHC.trim.contigs.bad.accnos.temp, group=inputfileMMPHC.contigs.groups)
Removed 2428829 sequences from your group file.
Output File Names:
inputfileMMPHC.contigs.pick.groups
/******************************************/
Output File Names:
inputfileMMPHC.trim.contigs.good.summary
inputfileMMPHC.trim.contigs.good.fasta
inputfileMMPHC.trim.contigs.bad.accnos
inputfileMMPHC.contigs.good.groups
It took 275 secs to screen 12889892 sequences.
mothur > summary.seqs(fasta=inputfileMMPHC.trim.contigs.good.fasta )
Using 8 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 249 249 0 3 1
2.5%-tile: 1 252 252 0 3 261527
25%-tile: 1 253 253 0 4 2615266
Median: 1 253 253 0 4 5230532
75%-tile: 1 253 253 0 5 7845798
97.5%-tile: 1 253 253 0 6 10199537
Maximum: 1 274 274 0 11 10461063
Mean: 1 252 252 0 4
# of Seqs: 10461063
It took 86 secs to summarize 10461063 sequences.
Output File Names:
inputfileMMPHC.trim.contigs.good.summary
mothur > unique.seqs(fasta=inputfileMMPHC.trim.contigs.good.fasta )
Output File Names:
inputfileMMPHC.trim.contigs.good.names
inputfileMMPHC.trim.contigs.good.unique.fasta
mothur > count.seqs(name=inputfileMMPHC.trim.contigs.good.names , group=inputfileMMPHC.contigs.good.groups )
It took 120 secs to create a table for 10461063 sequences.
Total number of sequences: 10461063
Output File Names:
inputfileMMPHC.trim.contigs.good.count_table
mothur > summary.seqs(count=inputfileMMPHC.trim.contigs.good.count_table )
Using inputfileMMPHC.trim.contigs.good.unique.fasta as input file for the fasta parameter.
Using 8 processors.
**Start End NBases Ambigs Polymer NumSeqs**
**Minimum: 1 249 249 0 3 1**
**2.5%-tile: 1 252 252 0 3 261527**
**25%-tile: 1 253 253 0 4 2615266**
**Median: 1 253 253 0 4 5230532**
**75%-tile: 1 253 253 0 5 7845798**
**97.5%-tile: 1 253 253 0 6 10199537**
**Maximum: 1 274 274 0 11 10461063**
**Mean: 1 252 252 0 4**
**# of unique seqs: 363479**
**total # of seqs: 10461063**
**It took 10 secs to summarize 10461063 sequences.**
Output File Names:
inputfileMMPHC.trim.contigs.good.unique.summary
(base) ubuntu@liying-biggvl:/data/MMP_gutmicrobiome_2020$ wget https://mothur.s3.us-east-2.amazonaws.com/wiki/silva.seed_v138.tgz
--2020-06-24 21:08:53-- https://mothur.s3.us-east-2.amazonaws.com/wiki/silva.seed_v138.tgz
Resolving mothur.s3.us-east-2.amazonaws.com (mothur.s3.us-east-2.amazonaws.com)... 52.219.104.40
Connecting to mothur.s3.us-east-2.amazonaws.com (mothur.s3.us-east-2.amazonaws.com)|52.219.104.40|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 13559908 (13M) [application/x-gzip]
Saving to: 'silva.seed_v138.tgz'
silva.seed_v138.tgz 100%[=========================================================================================================================================>] 12.93M 10.4MB/s in 1.2s
2020-06-24 21:08:55 (10.4 MB/s) - 'silva.seed_v138.tgz' saved [13559908/13559908]
(base) ubuntu@liying-biggvl:/data/MMP_gutmicrobiome_2020$ tar -xvzf silva.seed_v138.tgz
silva.seed_v138.tax
silva.seed_v138.align
README.md
mothur > pcr.seqs(fasta=silva.seed_v138.align , start=11894, end=25319, keepdots=F)
Using 8 processors.
957
957
957
957
956
957
957
958
[NOTE]: no sequences were bad, removing silva.seed_v138.bad.accnos
It took 2 secs to screen 7656 sequences.
Output File Names:
silva.seed_v138.pcr.align
mothur > rename.file(input=silva.seed_v138.align, new=silva.v4.align)
Current files saved by mothur:
fasta=silva.seed_v138.pcr.align
processors=8
mothur > rename.file(input=silva.v4.align, new=silva.v4.fasta)
Current files saved by mothur:
fasta=silva.seed_v138.pcr.align
processors=8
mothur > summary.seqs(fasta=silva.v4.fasta )
Using 8 processors.
Start End NBases Ambigs Polymer NumSeqs
:
silva.v4.summaryMinimum: 1044 42803 1336 0 4 1
2.5%-tile: 1044 43116 1400 0 5 192
25%-tile: 1044 43116 1444 0 5 1915
Median: 1044 43116 1463 0 6 3829
75%-tile: 1044 43116 1500 0 6 5743
97.5%-tile: 1044 43116 1748 4 7 7465
Maximum: 1103 43116 2838 5 12 7656
Mean: 1044 43115 1520 0 5
# of Seqs: 7656
It took 2 secs to summarize 7656 sequences.
Output File Names
mothur > rename.file(input=silva.seed_v138.pcr.align, new=silva.v4.align)
Current files saved by mothur:
processors=8
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 12067 261 0 3 1
2.5%-tile: 1 13425 292 0 4 192
25%-tile: 1 13425 293 0 4 1915
Median: 1 13425 293 0 5 3829
75%-tile: 1 13425 296 0 6 5743
97.5%-tile: 1 13425 461 1 6 7465
Maximum: 3 13425 1122 5 10 7656
Mean: 1 13424 333 0 4
# of Seqs: 7656
It took 1 secs to summarize 7656 sequences.
Output File Names:
Silva.v4.summary
mothur > align.seqs(fasta=inputfileMMPHC.trim.contigs.good.unique.fasta , reference=silva.v4.align )
Using 8 processors.
It took 117 secs to align 363479 sequences.
[WARNING]: 18 of your sequences generated alignments that eliminated too many bases, a list is provided in inputfileMMPHC.trim.contigs.good.unique.flip.accnos.
[NOTE]: 6 of your sequences were reversed to produce a better alignment.
It took 117 seconds to align 363479 sequences.
Output File Names:
inputfileMMPHC.trim.contigs.good.unique.align
inputfileMMPHC.trim.contigs.good.unique.align.report
inputfileMMPHC.trim.contigs.good.unique.flip.accnos
mothur > summary.seqs(fasta=inputfileMMPHC.trim.contigs.good.unique.align, count=inputfileMMPHC.trim.contigs.good.count_table )
Using 8 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 1235 1 0 1 1
2.5%-tile: 1968 11550 252 0 3 261527
25%-tile: 1968 11550 253 0 4 2615266
Median: 1968 11550 253 0 4 5230532
75%-tile: 1968 11550 253 0 5 7845798
97.5%-tile: 1968 11550 253 0 6 10199537
Maximum: 13425 13425 273 0 11 10461063
Mean: 1968 11549 252 0 4
# of unique seqs: 363479
total # of seqs: 10461063
It took 31 secs to summarize 10461063 sequences.
Output File Names:
inputfileMMPHC.trim.contigs.good.unique.summary
mothur > screen.seqs(fasta=inputfileMMPHC.trim.contigs.good.unique.align, count=inputfileMMPHC.trim.contigs.good.count_table, summary=inputfileMMPHC.trim.contigs.good.unique.summary , start=1968, end=11550, maxhomop=8)
It took 80 secs to screen 363479 sequences, removed 4959.
/******************************************/
Running command: remove.seqs(accnos=inputfileMMPHC.trim.contigs.good.unique.bad.accnos.temp, count=inputfileMMPHC.trim.contigs.good.count_table)
Removed 48300 sequences from your count file.
Output File Names:
inputfileMMPHC.trim.contigs.good.pick.count_table
/******************************************/
Output File Names:
inputfileMMPHC.trim.contigs.good.unique.good.summary
inputfileMMPHC.trim.contigs.good.unique.good.align
inputfileMMPHC.trim.contigs.good.unique.bad.accnos
inputfileMMPHC.trim.contigs.good.good.count_table
It took 94 secs to screen 363479 sequences.
mothur > summary.seqs(fasta=current, count=current)
Using inputfileMMPHC.trim.contigs.good.good.count_table as input file for the count parameter.
Using inputfileMMPHC.trim.contigs.good.unique.good.align as input file for the fasta parameter.
Using 8 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 11550 249 0 3 1
2.5%-tile: 1968 11550 252 0 3 260320
25%-tile: 1968 11550 253 0 4 2603191
Median: 1968 11550 253 0 4 5206382
75%-tile: 1968 11550 253 0 5 7809573
97.5%-tile: 1968 11550 253 0 6 10152444
Maximum: 1968 13425 273 0 8 10412763
Mean: 1967 11550 252 0 4
# of unique seqs: 358520
total # of seqs: 10412763
It took 31 secs to summarize 10412763 sequences.
Output File Names:
inputfileMMPHC.trim.contigs.good.unique.good.summary
mothur > filter.seqs(fasta=inputfileMMPHC.trim.contigs.good.unique.good.align , vertical=T, trump=.)
It took 21 secs to filter 358520 sequences.
Length of filtered alignment: 502
Number of columns removed: 12923
Length of the original alignment: 13425
Number of sequences used to construct filter: 358520
Output File Names:
inputfileMMPHC.filter
inputfileMMPHC.trim.contigs.good.unique.good.filter.fasta
mothur > unique.seqs(fasta=inputfileMMPHC.trim.contigs.good.unique.good.filter.fasta , count = inputfileMMPHC.trim.contigs.good.good.count_table )
Output File Names:
inputfileMMPHC.trim.contigs.good.unique.good.filter.count_table
inputfileMMPHC.trim.contigs.good.unique.good.filter.unique.fasta
mothur > pre.cluster(fasta=inputfileMMPHC.trim.contigs.good.unique.good.filter.unique.fasta , count = inputfileMMPHC.trim.contigs.good.unique.good.filter.count_table , diffs=2)
mothur > chimera.vsearch(fasta=inputfileMMPHC.trim.contigs.good.unique.good.filter.unique.precluster.fasta , count = inputfileMMPHC.trim.contigs.good.unique.good.filter.unique.precluster.count_table , dereplicate=t)
Output File Names:
inputfileMMPHC.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.count_table
inputfileMMPHC.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.chimeras
inputfileMMPHC.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.accnos
mothur > remove.seqs(fasta=inputfileMMPHC.trim.contigs.good.unique.good.filter.unique.precluster.fasta , accnos=inputfileMMPHC.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.accnos )
Removed 78744 sequences from your fasta file.
Output File Names:
inputfileMMPHC.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta
mothur > summary.seqs(fasta=current, count=current)
Using inputfileMMPHC.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.count_table as input file for the count parameter.
Using inputfileMMPHC.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta as input file for the fasta parameter.
Using 8 processors.
**Start End NBases Ambigs Polymer NumSeqs**
**Minimum: 1 502 229 0 3 1**
**2.5%-tile: 1 502 252 0 3 252954**
**25%-tile: 1 502 253 0 4 2529535**
**Median: 1 502 253 0 4 5059069**
**75%-tile: 1 502 253 0 5 7588603**
**97.5%-tile: 1 502 253 0 6 9865183**
**Maximum: 1 502 259 0 8 10118136**
**Mean: 1 502 252 0 4**
**# of unique seqs: 36090**
**total # of seqs: 10118136**
It took 0 secs to summarize 10118136 sequences.
Output File Names:
inputfileMMPHC.trim.contigs.good.unique.good.filter.unique.precluster.pick.summary