consensus filter

In order to reduce memory requirements, I wonder whether you would be able to generate consensus filters for aligned fasta files?

That is, suppose we process two separate runs up to the filter.seqs step, it would be great if:
(i) we could generate filters separately
(ii) merge the two (or x) filters in a consensus filter

I’m working on a shell script to do just that, but it may be a feature that proves popular.

Cheers,
SR

Should anyone be interested, below is the script I wrote for the apple shell. Note that the filters.2merge file is the concatenation of filters, one below the other. I realize there’s probably a faster script, but through my bumbling, I managed to write one that works.


# Shell script to generate consensus filter from multiple *.align files in mothur # First, we add spaces between all (after each) numbers. sed 's/[0-9]/&\ /g' filters.tomerge > temp1.filter

Second, we transpose the resulting matrix.

awk '{ for (i = 1; i <= NF; i++) f _= f " " $i ;
if (NF > n) n = NF }
END { for (i = 1; i <= n; i++) sub(/^ */, “”, f
) ;
for (i = 1; i <= n; i++) print f _}
’ temp1.filter > temp2.filter

Third, we remove all spaces.

sed ‘s/\ //g’ temp2.filter > temp3.filter

Fourth, if a row is positive, we append 1 to the end of line of a file;

otherwise we write 0.

touch temp4.filter
while read LINE; do {
if [ “$LINE” -gt 0 ] ; then
echo “1” >> temp4.filter
fi
if [ “$LINE” -eq 0 ] ; then
echo “0” >> temp4.filter
fi
}
done < temp3.filter

Another transpose

awk '{ for (i = 1; i <= NF; i++) f _= f " " $i ;
if (NF > n) n = NF }
END { for (i = 1; i <= n; i++) sub(/^ */, “”, f
) ;
for (i = 1; i <= n; i++) print f _}
’ temp4.filter > temp5.filter

Finally, we remove all spaces.

sed ‘s/\ //g’ temp5.filter > consensus.filter

rm -f temp1.filter temp2.filter temp3.filter temp4.filter temp5.filter________