Hi Sarah,
I am currently using mothur 1.39 on Windows (because I have not been able yet to update manually the version of mothur that is included in the Bio-Linux distribution).
This is the logfile of the session during which I realized this discrepancy. I removed the lines corresponding to the group counts to make it shorter.
Windows version
Running 64Bit Version
mothur v.1.39.0
Last updated: 1/23/2017
by
Patrick D. Schloss
Department of Microbiology & Immunology
University of Michigan
http://www.mothur.org
When using, please cite:
Schloss, P.D., et al., Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol, 2009. 75(23):7537-41.
Distributed under the GNU General Public License
Type ‘help()’ for information on the commands that are available
For questions and analysis support, please visit our forum at > https://www.mothur.org/forum
Type ‘quit()’ to exit program
Interactive Mode
mothur >
get.current()
Current RAM usage: 4.68532 Gigabytes. Total Ram: 31.9458 Gigabytes.
Current default directory saved by mothur: E:\Sequencing_data\Bourg-Fidele_data\2017-01-26_Eleventh_try\
Current working directory: E:\Sequencing_data\Bourg-Fidele_data\2017-01-26_Eleventh_try\
mothur >
make.contigs(file=bf11_files.txt, processors=4)
[…]
It took 4505 secs to process 13447322 sequences.
Group count:
[…]
Total of all groups is 13447322
Output File Names:
bf11_files.trim.contigs.fasta
bf11_files.trim.contigs.qual
bf11_files.contigs.report
bf11_files.scrap.contigs.fasta
bf11_files.scrap.contigs.qual
bf11_files.contigs.groups
[WARNING]: your sequence names contained ‘:’. I changed them to ‘_’ to avoid problems in your downstream analysis.
mothur >
count.groups(group=current)
Using bf11_files.contigs.groups as input file for the group parameter.
[…]
Total seqs: 13447322.
Output File Names:
bf11_files.contigs.count.summary
mothur >
pcr.seqs(fasta=current, oligos=primers_16S.oligos, group=current, pdiffs=0, nomatch=reject, keepprimer=true)
Using bf11_files.trim.contigs.fasta as input file for the fasta parameter.
Using bf11_files.contigs.groups as input file for the group parameter.
Using 4 processors.
Removed 9723566 sequences from your group file.
Output File Names:
bf11_files.trim.contigs.pcr.fasta
bf11_files.trim.contigs.bad.accnos
bf11_files.trim.contigs.scrap.pcr.fasta
bf11_files.contigs.pcr.groups
It took 956 secs to screen 13447294 sequences.
mothur >
count.groups(group=current)
Using bf11_files.contigs.pcr.groups as input file for the group parameter.
[…]
Total seqs: 3723756.
Output File Names:
bf11_files.contigs.pcr.count.summary
mothur >
remove.groups(fasta=current, group=current, groups=S09-S48)
Using bf11_files.trim.contigs.pcr.fasta as input file for the fasta parameter.
Using bf11_files.contigs.pcr.groups as input file for the group parameter.
Removed 377 sequences from your fasta file.
Removed 377 sequences from your group file.
Output File names:
bf11_files.trim.contigs.pcr.pick.fasta
bf11_files.contigs.pcr.pick.groups
mothur >
count.groups(group=current)
Using bf11_files.contigs.pcr.pick.groups as input file for the group parameter.
[...]
Total seqs: 3723379.
Output File Names:
bf11_files.contigs.pcr.pick.count.summary
mothur >
screen.seqs(fasta=current, group=current, maxambig=0, maxhomop=10, minlength=400)
Using bf11_files.trim.contigs.pcr.pick.fasta as input file for the fasta parameter.
Using bf11_files.contigs.pcr.pick.groups as input file for the group parameter.
Using 4 processors.
Output File Names:
bf11_files.trim.contigs.pcr.pick.good.fasta
bf11_files.trim.contigs.pcr.pick.bad.accnos
bf11_files.contigs.pcr.pick.good.groups
It took 89 secs to screen 3723351 sequences.
mothur >
count.groups(group=current)
Using bf11_files.contigs.pcr.pick.good.groups as input file for the group parameter.
[…]
Total seqs: 2282158.
Output File Names:
bf11_files.contigs.pcr.pick.good.count.summary
mothur >
summary.seqs(fasta=current)
Using bf11_files.trim.contigs.pcr.pick.good.fasta as input file for the fasta parameter.
Using 4 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 400 400 0 3 1
2.5%-tile: 1 440 440 0 4 57054
25%-tile: 1 440 440 0 5 570533
Median: 1 442 442 0 5 1141066
75%-tile: 1 464 464 0 6 1711598
97.5%-tile: 1 466 466 0 7 2225077
Maximum: 1 500 500 0 10 2282130
Mean: 1 449.212 449.212 0 5.39261
of Seqs: 2282130
Output File Names:
bf11_files.trim.contigs.pcr.pick.good.summary
It took 42 secs to summarize 2282130 sequences.
The discrepancy of 28 sequences seems to be there right after the make.contigs (because the “wc -l / 2” on the fasta is 13447294 whereas the count.groups() is 13447322) and to maintain afterwards (after pcr.seqs, screen.seqs…).
Yours,
Maxime