missing name after pre.cluster command (v1.25.0)

stef · May 22, 2012, 1:56pm

Hi there,

mothur usually works fine for us.

We analysed our .sff data (reverse primers) with mothur v1.24.1 and it worked fine. With the same input data and mothur commands in mothur v1.25.0, we get now an error message after using the pre.cluster command:

sffinfo(sff=in.sff, flow=T)
summary.seqs(fasta=in.fasta)
trim.flows(flow=in.flow, oligos=oligos.txt, bdiffs=1, pdiffs=2, minflows=360, maxflows=720, processors=8)
shhh.flows(file=in.flow.files, processors=8)
trim.seqs(fasta=in.shhh.fasta, name=in.shhh.names, oligos=oligos.txt, flip=T, pdiffs=2, bdiffs=1, maxhomop=8, minlength=200, processors=8)
summary.seqs(fasta=in.shhh.trim.fasta, name=in.shhh.trim.names)
unique.seqs(fasta=in.shhh.trim.fasta, name=in.shhh.trim.names)
summary.seqs(fasta=in.shhh.trim.unique.fasta, name=in.shhh.trim.names)
align.seqs(fasta=in.shhh.trim.unique.fasta, reference=silva.bacteria.fasta, processors=8)
summary.seqs(fasta=in.shhh.trim.unique.align, name=in.shhh.trim.names)
screen.seqs(fasta=in.shhh.trim.unique.align, name=in.shhh.trim.names, group=in.shhh.groups, optimize=start-end, criteria=99, minlength=200, processors=8)
summary.seqs(fasta=in.shhh.trim.unique.good.align, name=in.shhh.trim.good.names)
count.groups(group=in.shhh.good.groups)
filter.seqs(fasta=in.shhh.trim.unique.good.align, vertical=T, trump=., processors=8)
unique.seqs(fasta=in.shhh.trim.unique.good.filter.fasta, name=in.shhh.trim.good.names)

pre.cluster(fasta=in.shhh.trim.unique.good.filter.unique.fasta, name=in.shhh.trim.unique.good.filter.names, group=in.shhh.good.groups, diffs=2)
missing name HB93FIC05F00A2
…
missing name HB93FIC05GNOSQ

[ERROR]: Your name file contains 28709 valid sequences, and your groupfile contains 44039, please correct.

All in all, 15330 names seem to be missing in the names file (or shoud be removed from the group file). Where did we go wrong?

We already tried to adapt the the screen.seqs command, but the error messages still popped up.

Perhaps, somebody can help us to figure out how to improve the commands in order to avoid those error messages?

Any help is highly appreciated.

Thanks a lot in advance.

Regards, stef

pschloss · May 22, 2012, 6:10pm

screen.seqs(fasta=in.shhh.trim.unique.align, name=in.shhh.trim.names, group=in.shhh.groups, optimize=start-end, criteria=99, minlength=200, processors=8)

I think what you want is…

screen.seqs(fasta=in.shhh.trim.unique.align, name=in.shhh.trim.unique.names, group=in.shhh.groups, optimize=start-end, criteria=99, minlength=200, processors=8)
in.shhh.trim.unique.names is the output from unique.seqs

stef · May 24, 2012, 6:56am

Thank you so much for your reply.

I am currently running the modified script, and everything looks perfect now.

Thanks again, stef

kbrann3 · December 11, 2013, 5:33pm

I recently discovered the precluster command, which has greatly reduced the time to cluster my set of 10 environmental samples. I am following a protocol very similar to the one in the above post, but I do not use groups at this point, I make groups and a shared file after I finish clustering. I am not sure I understand how precluster deals with grouping sample names together, and I am afraid I am losing data. Please see below for the number of sequences at each step:

â€¦good.filter.fasta= 249,448
â€¦good.filter.unique.fasta=58,766
â€¦good.filter.unique.precluster.fasta=48,486
â€¦good.filter.unique.precluster.an.list=58,766

I am concerned that if there is a unique sequence that is shared between samples, my end product does not parse that information back out and I am left with a unique sequence from only one sample.

Thanks,
kbrann3

pschloss · December 11, 2013, 8:50pm

kbrann3-

I suspect you’re not giving the names file to either unique.seqs or pre.cluster

Topic		Replies	Views
pre.cluster bug? mothur bugs	10	10373	June 12, 2012
pre.cluster crash mothur bugs	3	3692	August 8, 2013
Error in pre.cluster command mothur bugs	1	5192	July 18, 2012
Pre.cluster removes the majority of sequences and names mismatch mothur bugs	2	756	July 5, 2021
more sequences in groupfile than in name file mothur bugs	4	4137	July 13, 2012

missing name after pre.cluster command (v1.25.0)

Related topics