Usefulness of remove.rare for analysis bias reduction

Hello,

I have some 16S sequence data targeting the V3 region (58 bp) from human gut bacteria sequenced by MiSeq. I have processed all my data along some variation of the SOP onsite but as I’m looking at reducing the bias of my analysis I’ve come across the question of whether the removal of rare sequences is necessary for good data analysis. I will of course sub-sample my various samples to eliminate the variance in sequences per group, but I am not finding any solid publications advocating how useful removing these rare sequences is. When I run my data through and compare no removal of rare sequences vs removing nseqs=2 vs removing nseqs=5 there is a substantial decrease in the recovered sobs, suggesting an large amount of low abundance OTUs in these samples.

None Removed 2’s Removed 5’s Removed
label group sobs sobs sobs
unique HGAM1 2900 1846 1482
unique HGAM2 2632 1817 1456
unique HGAM3 3044 2093 1643
unique HGAM4 2476 1785 1466
unique HGAM5 1889 1508 1293
unique HGAM51 1094 891 766
unique HGAM52 943 808 730
unique HGAM53 1379 1129 1004
unique HGAM54 930 806 717
unique HGAM55 1111 916 813
unique HGAM6 1682 1317 1134
unique HGAM61 1343 1132 1029
unique HGAM62 1114 953 843
unique HGAM63 1262 1074 969
unique HGAM64 1299 1101 1012
unique HGAM65 1596 1162 954
unique HGAM7 1744 1417 1224
unique HGAM71 1356 1074 913
unique HGAM72 1192 937 749
unique HGAM73 1098 843 698
unique HGAM74 1107 897 754
unique HGAM75 1288 963 786
unique HGAMTF 2384 1338 1064
unique PGAM1 2096 1331 1079
unique PGAM2 2515 1561 1259
unique PGAM3 2610 1641 1312
unique PGAM4 1855 1371 1147
unique PGAM5 1762 1289 1084
unique PGAM6 2075 1411 1178
unique PGAMTF 3745 1417 1011

When I look at NMDS plots of these three comparisons there are differences in ordination for a few of my groups from the no removal compared to removal of 2 or 5 seqs. There isn’t much difference between the plots of the 2 and 5 removal. Can anyone offer advice (or point me in the direction of some papers) on whether removing rare sequences is useful for community analyses.

Thanks for any advice!

I (Pat) see no reason to remove rare sequence types beyond what is done via subsampling. People have the idea that rare sequences are bad. In fact some more abundant sequences that even show up in multiple samples are chimeras or sequencing artifacts.