consensus filter

TreeStump · July 29, 2014, 1:02pm

In order to reduce memory requirements, I wonder whether you would be able to generate consensus filters for aligned fasta files?

That is, suppose we process two separate runs up to the filter.seqs step, it would be great if:
(i) we could generate filters separately
(ii) merge the two (or x) filters in a consensus filter

I’m working on a shell script to do just that, but it may be a feature that proves popular.

Cheers,
SR

TreeStump · July 30, 2014, 2:30pm

Should anyone be interested, below is the script I wrote for the apple shell. Note that the filters.2merge file is the concatenation of filters, one below the other. I realize there’s probably a faster script, but through my bumbling, I managed to write one that works.

# Shell script to generate consensus filter from multiple *.align files in mothur # First, we add spaces between all (after each) numbers. sed 's/[0-9]/&\ /g' filters.tomerge > temp1.filter

Second, we transpose the resulting matrix.

awk '{ for (i = 1; i <= NF; i++) f _= f " " $i ;
if (NF > n) n = NF }
END { for (i = 1; i <= n; i++) sub(/^ */, “”, f) ;
for (i = 1; i <= n; i++) print f _}
’ temp1.filter > temp2.filter

Third, we remove all spaces.

sed ‘s/\ //g’ temp2.filter > temp3.filter

Fourth, if a row is positive, we append 1 to the end of line of a file;

otherwise we write 0.

touch temp4.filter
while read LINE; do {
if [ “$LINE” -gt 0 ] ; then
echo “1” >> temp4.filter
fi
if [ “$LINE” -eq 0 ] ; then
echo “0” >> temp4.filter
fi
}
done < temp3.filter

Another transpose

awk '{ for (i = 1; i <= NF; i++) f _= f " " $i ;
if (NF > n) n = NF }
END { for (i = 1; i <= n; i++) sub(/^ */, “”, f) ;
for (i = 1; i <= n; i++) print f _}
’ temp4.filter > temp5.filter

Finally, we remove all spaces.

sed ‘s/\ //g’ temp5.filter > consensus.filter

rm -f temp1.filter temp2.filter temp3.filter temp4.filter temp5.filter________

Topic		Replies	Views
consensus.seqs input files Commands in mothur	4	3968	June 26, 2012
Issue for generate shared file (for Subsampling) Commands in mothur	1	1280	March 24, 2015
Merging Files Theory behind mothur	9	8114	January 20, 2014
New groupfile needed after unique.seqs? Commands in mothur	5	3958	April 8, 2013
Combining multiple fasta files for subsequent analysis Commands in mothur	1	1291	March 20, 2016