how to remove bad quality alignments?

Hello!, after I finished my mothur analysis (publication is coming) I realized that the sequences that I used are not as good as I’d like. Let me tell you why.

Before use the “align.seqs” command I have 90000 sequences aprox. and when I do the alignment each sequence shows it’s best hit for each one, but it doesn’t discriminate between good and bad alignments.

There’s a lot of sequences with a similarity between query and hit under 90% and I’d like to use only sequences with al least a similarity around 97% (aprox 17000 of my sequences). I’ve tried perl scripts to extract only what I need of the fasta file that I use in the alignment and reapeat it, but if I continue with the analysis appears a lot of errors related with inconsistence between groups, names and fasta files.
This is an example.

mothur > pre.cluster(fasta=sff.unique.good.filter.unique.fasta, name=sff.unique.good.filter.names, group=sff.good.groups, diffs=2)

Using 1 processors.

[ERROR]: Your name file contains 73043 valid sequences, and your groupfile contains 199832, please correct.

Running command: unique.seqs(fasta=sff.unique.good.filter.unique.precluster.fasta, name=sff.unique.good.filter.unique.precluster.names)
[ERROR]: sff.unique.good.filter.unique.precluster.fasta is blank, aborting.
Using sff.unique.good.filter.unique.fasta as input file for the fasta parameter.
[ERROR]: sff.unique.good.filter.unique.precluster.names is blank, aborting.

My question is. Is there a command that say to mothur “please remove all the sequences with alignments under XX% of my fasta, groups and names file”?

I’d be really glad if somebody could help me.


No there isn’t. This seems like an overly strict method of screening your sequences.

The errors about mismatching files typically occur because the correct fasta, names, and group files were used in a remove.seqs/get.seqs step.