Another issue...Pre.cluster

Dear all,

Followed the SOP to this point, got another error message. Seems like I lost some sequences in the name file compare to the group file.
I have no idea which step might be gone wrong. Please help.

[b]mothur >
pre.cluster(fasta=s3.shhh.trim.unique.good.filter.unique.fasta, name=s3.shhh.trim.unique.good.filter.names, group=s3.shhh.good.groups, diffs=2)

Using 2 processors.

[ERROR]: Your name file contains 113588 valid sequences, and your groupfile contains 115543, please correct.
[ERROR]: process 0 only processed 1 of 5 groups assigned to it, quitting.

Running command: unique.seqs(fasta=s3.shhh.trim.unique.good.filter.unique.precluster.fasta, name=s3.shhh.trim.unique.good.filter.unique.precluster.names)
[ERROR]: s3.shhh.trim.unique.good.filter.unique.precluster.fasta is blank, aborting.
Using s3.shhh.trim.unique.good.filter.unique.fasta as input file for the fasta parameter.
[ERROR]: s3.shhh.trim.unique.good.filter.unique.precluster.names is blank, aborting.[/b]

Dear all,
I think the problem might stem from the align.seqs command, a file was created after this command named

This file carried a line

I57WTBF03C0KZ4 reverse complement did NOT produce a better alignment so it was not used, please check sequence.

I assume this is a bad sequence and should be removed before any subsequent processing ( screen.seqs etc.)

Since I DID NOT remove this sequence, this is why I came into the problem mentioned above??

If so, what wiill be the best way to remove these sequences (remove.seqs? which file should be used in this command?)

Please help.

I’ve had this issue in the past, so modified the workflow to get around these sequences:

align.seqs(fasta=FILE.fasta, reference=XXX, flip=T)
system(grep "NOT" FILE.flip.acnos > bad_seqs.accnos)
remove.seqs(fasta=FILE.align, count=FILE.count_table, accnos=bad_seqs.accnos)

I started doing this a while back (before mothur used count tables) so I don’t know if it’s necessary anymore, although if you’ve encountered this problem then this might be a helpful addition to your pipeline. Typically I’ll check that there are actually sequence names in the bad_seqs.accnos file before I run remove.seqs because sometimes it’s not necessary. And if you’re working on Windows you’ll need to use ‘find’ instead of ‘grep’.

When the number of bases in the aligned sequence falls below 50% of the original number of bases and flip=t, mothur will try to align the reverse compliment of the sequence. When both the reverse and the forward sequence alignments result in more than a 50% reduction in the number of bases (a poor alignment) mothur reports it in the file. You can easy remove these sequences by setting the minlength parameter in the screen.seqs command. You can also adjust the sensitivity of the flip threshold (the 50% number) in the align.seqs command with the threshold parameter. For example, threshold=0.60 would indicate 60%.

To deal with the file mismatch issue:

This is most often caused when a name or group file is left off a command or a typo is made and the wrong name of group file is given. You can resolve it using the list.seqs and get.seqs commands.

mothur > list.seqs(group=s3.shhh.good.groups) - list the sequences in your group file
mothur > get.seqs(fasta=s3.shhh.trim.unique.good.filter.unique.fasta, name=s3.shhh.trim.unique.good.filter.names, dups=false) - select only the names in the group file.