Hi. I’m looking to filter columns from a multiple sequence alignment that contain identical characters. For example, if one column in the MSA only contains “C”. I know that I can keep those columns with “soft” filtering in filter.seqs, but is there a way to remove identical positions and leave the variable ones?
You should be able to use the Arb program to export a filter file.
Arb supports minimum and maximum column identities.
I think the Arb filter file is the same format as Mothur’s lane mask file (just a bit string).
Of course, it would be cool if it Mothur’s filter.seqs command supported this directly.
If there’s enough demand for such a feature we could certainly put it in. Another option for now, if you don’t know perl, would just to use a text editor to do the flipping 0->2, 1->0, 2->1. But of course, you should know perl