Another issue...Pre.cluster

Thom2525 · October 13, 2015, 4:32am

Dear all,

Followed the SOP to this point, got another error message. Seems like I lost some sequences in the name file compare to the group file.
I have no idea which step might be gone wrong. Please help.

[b]mothur >
pre.cluster(fasta=s3.shhh.trim.unique.good.filter.unique.fasta, name=s3.shhh.trim.unique.good.filter.names, group=s3.shhh.good.groups, diffs=2)

Using 2 processors.

[ERROR]: Your name file contains 113588 valid sequences, and your groupfile contains 115543, please correct.
[ERROR]: process 0 only processed 1 of 5 groups assigned to it, quitting.

/******************************************/
Running command: unique.seqs(fasta=s3.shhh.trim.unique.good.filter.unique.precluster.fasta, name=s3.shhh.trim.unique.good.filter.unique.precluster.names)
[ERROR]: s3.shhh.trim.unique.good.filter.unique.precluster.fasta is blank, aborting.
Using s3.shhh.trim.unique.good.filter.unique.fasta as input file for the fasta parameter.
[ERROR]: s3.shhh.trim.unique.good.filter.unique.precluster.names is blank, aborting.[/b]

Thom2525 · October 13, 2015, 11:13am

Dear all,
I think the problem might stem from the align.seqs command, a file was created after this command named

sample3.shhh.trim.unique.flip.accnos
This file carried a line

I57WTBF03C0KZ4 reverse complement did NOT produce a better alignment so it was not used, please check sequence.

I assume this is a bad sequence and should be removed before any subsequent processing ( screen.seqs etc.)

Since I DID NOT remove this sequence, this is why I came into the problem mentioned above??

If so, what wiill be the best way to remove these sequences (remove.seqs? which file should be used in this command?)

Please help.

dwaite · October 13, 2015, 7:00pm

I’ve had this issue in the past, so modified the workflow to get around these sequences:

align.seqs(fasta=FILE.fasta, reference=XXX, flip=T)
system(grep "NOT" FILE.flip.acnos > bad_seqs.accnos)
remove.seqs(fasta=FILE.align, count=FILE.count_table, accnos=bad_seqs.accnos)
summary.seqs()
screen.seqs()

I started doing this a while back (before mothur used count tables) so I don’t know if it’s necessary anymore, although if you’ve encountered this problem then this might be a helpful addition to your pipeline. Typically I’ll check that there are actually sequence names in the bad_seqs.accnos file before I run remove.seqs because sometimes it’s not necessary. And if you’re working on Windows you’ll need to use ‘find’ instead of ‘grep’.

westcott · October 19, 2015, 9:08pm

When the number of bases in the aligned sequence falls below 50% of the original number of bases and flip=t, mothur will try to align the reverse compliment of the sequence. When both the reverse and the forward sequence alignments result in more than a 50% reduction in the number of bases (a poor alignment) mothur reports it in the file. You can easy remove these sequences by setting the minlength parameter in the screen.seqs command. You can also adjust the sensitivity of the flip threshold (the 50% number) in the align.seqs command with the threshold parameter. For example, threshold=0.60 would indicate 60%.

To deal with the file mismatch issue:

This is most often caused when a name or group file is left off a command or a typo is made and the wrong name of group file is given. You can resolve it using the list.seqs and get.seqs commands. http://www.mothur.org/wiki/List.seqs http://www.mothur.org/wiki/Get.seqs

mothur > list.seqs(group=s3.shhh.good.groups) - list the sequences in your group file
mothur > get.seqs(fasta=s3.shhh.trim.unique.good.filter.unique.fasta, name=s3.shhh.trim.unique.good.filter.names, dups=false) - select only the names in the group file.

Topic		Replies	Views
Error in pre.cluster command mothur bugs	1	5187	July 18, 2012
New problem running SOP Commands in mothur	4	5301	October 24, 2012
pre.cluster crash mothur bugs	3	3684	August 8, 2013
Help in pre.cluster Commands in mothur	3	325	August 18, 2023
no equal numbers of sequences between name and group file mothur bugs	6	6865	May 5, 2012

Another issue...Pre.cluster

Related topics