filter w/ trump=.

Hello mothur folks;

I just wanted to check if some of mu results sounded reasonable to you.

I’m analyzing a data set following the Costello example. I get a dramatic decrease in the number of bases when I filter with trump=.

When I summarize before removing chimeras, the data looks like this:

Start End NBases Ambigs Polymer
Minimum: 1044 1764 150 0 3
2.5%-tile: 1044 4072 154 0 3
25%-tile: 1044 5323 208 0 4
Median: 1044 6428 308 0 4
75%-tile: 1044 8508 370 0 5
97.5%-tile: 1044 10247 407 0 6
Maximum: 1044 13855 442 0 8

of Seqs: 71322

After I filter (vertical=T and trump=.), I get:

Start End NBases Ambigs Polymer
Minimum: 1 351 44 0 2
2.5%-tile: 1 351 50 0 2
25%-tile: 1 351 54 0 3
Median: 1 351 60 0 3
75%-tile: 1 351 62 0 4
97.5%-tile: 1 351 69 0 5
Maximum: 1 351 183 0 7

of Seqs: 46341


Does this sound reasonable? Are there enough bases remaining to be reliable in terms of taxonomic assignments, etc?

Thanks for your time,

Karen

Something weird seems to be going on - filter.seqs does not remove sequences - just positions in the alignment. Here you go from 71322 to 46341 sequences. Are you sure that you’re using the same file as input to summary.seqs and filter.seqs and that the output from filter.seqs is going into summary.seqs?

Sorry, I should have provided more background!
I removed chimeras between the two steps I sent you. The decrease in the # of bases comes when filtering using trump=. after removing chimeric sequences, as in the example.

Still seems weird - the sequences should all be ~120-150 bp. Could you send the input to filter.seqs to mothur.bugs@gmail.com?

The problem seems to be in the alignment. Having looked at the sequences it was clear that sequences that were about the same length only occupied half the alignment space. To correct for this, I’d suggest using ksize=6 as a parameter in align.seqs - with these data, the alignments were much improved. Unfortunately, it also means that you’ll need to redo the chimera checking step.