filter columns

Hi. I’m looking to filter columns from a multiple sequence alignment that contain identical characters. For example, if one column in the MSA only contains “C”. I know that I can keep those columns with “soft” filtering in filter.seqs, but is there a way to remove identical positions and leave the variable ones?

thanks,
Jason

You should be able to use the Arb program to export a filter file.
Arb supports minimum and maximum column identities.
I think the Arb filter file is the same format as Mothur’s lane mask file (just a bit string).
Of course, it would be cool if it Mothur’s filter.seqs command supported this directly.

Robin

Alternatively, use filter.seqs(soft=100) and then write a PERL script to bit flip the resulting mask.

Robin

Thanks Robin, that’s a great idea. I have been trying to avoid using Arb if at all possible and this should do the trick.

Jason

If there’s enough demand for such a feature we could certainly put it in. Another option for now, if you don’t know perl, would just to use a text editor to do the flipping 0->2, 1->0, 2->1. But of course, you should know perl :slight_smile:

perl -e "map {print !$_||0} split ('',<>)" input > output

When I generate my filter and look at it in vim, the filter is a binary file:

11111111111^@1111^@^@11^@11^@11^@11^@11^@1…

I was expecting 0s. Is there any easy way to convert it? Or, am I doing something wrong?

thanks,
Jason