I’ve been trying to work my way through the SOP on an HMP mock dataset. Everything seems to work ok until:
mothur > pre.cluster(fasta=reads.shhh.trim.unique.good.filter.unique.fasta, name=reads.shhh.trim.unique.good.filter.names, group=reads.shhh.good.groups, diffs=2)
[ERROR]: Your name file contains 973 valid sequences, and your groupfile contains 1653, please correct.
The commands prior to that were:
sffinfo(sff=reads.sff, flow=T)
trim.flows(flow=reads.flow, oligos=oligos.txt, pdiffs=2, bdiffs=1, processors=8, minflows=300, maxflows=300)
shhh.flows(file=reads.flow.files, processors=8)
trim.seqs(fasta=reads.shhh.fasta, name=reads.shhh.names, oligos=oligos.txt, pdiffs=2, bdiffs=1, maxhomop=8, minlength=200, flip=T, processors=8)
unique.seqs(fasta=reads.shhh.trim.fasta, name=reads.shhh.trim.names)
align.seqs(fasta=reads.shhh.trim.unique.fasta, reference=silva.bacteria.fasta, processors=8)
screen.seqs(fasta=reads.shhh.trim.unique.align, name=reads.shhh.trim.unique.names, group=reads.shhh.groups, end=27659, optimize=start, criteria=95, processors=8)
filter.seqs(fasta=reads.shhh.trim.unique.good.align, vertical=T, trump=., processors=8)
unique.seqs(fasta=reads.shhh.trim.unique.good.filter.fasta, name=reads.shhh.trim.unique.good.names)