Hello all,
I have a couple of 16S datasets obtained from gut biopsies that I am trying to analyze. However, in both datasets I am getting really poor classification past the phylum level. I’ve noticed that following the screening/filtering steps after the alignment, I end up with really short sequences (from 186bp before the alignment to 67bp after filtering). I have tried playing around with the screen.seqs parameters, and have also tried aligning to the GreenGenes database, but I keep getting the same results. I have the same problem with both datasets.
For example, here is a summary of what I have following the alignment step in one of my datasets:
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1044 1053 5 0 2 1
2.5%-tile: 1044 1213 55 0 3 1728
25%-tile: 1044 2532 123 0 4 17272
Median: 1044 4978 197 0 4 34544
75%-tile: 1044 5413 246 0 5 51815
97.5%-tile: 1044 6409 322 0 6 67359
Maximum: 43111 43116 360 0 7 69086
Mean: 1077.25 4201.27 186.622 0 4.268
of Seqs: 69086
After running: screen.seqs(fasta=calgary.trim.unique.align,name=calgary.trim.names,group=calgary.groups,start=1044,optimize=end,criteria=85) filter.seqs(fasta=calgary.trim.unique.good.align,vertical=T,trump=.,processors=12)
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 237 45 0 2 1
2.5%-tile: 1 237 52 0 2 1565
25%-tile: 1 237 65 0 3 15648
Median: 1 237 68 0 4 31296
75%-tile: 1 237 71 0 4 46944
97.5%-tile: 1 237 75 0 4 61027
Maximum: 1 237 108 0 6 62591
Mean: 1 237 67.2747 0 3.46726
of Seqs: 62591
Does anyone have any ideas what could be going wrong here?
Thanks in advance!
Kyle