Split.abund cutoff theory


I am trying to make a heatmap that is visually discernable for a dataset of about 7,000 OTUs, which requires some sort of filtering to drop it down to about 50 or 100 OTUs.

Do you have recommendations for (or suggested papers on) if it is “better” to remove OTUs from your analysis on a system-wide basis (looking across all OTUs for a sample set and removing all OTUs that do not have at least X sequences in them or above some relative abundance threshold) or on a per sample basis (removing some cutoff or percent value for each sample separately).

Or just any general recommendations/caveats for doing this type of filtering.

Thank you!

I don’t “believe” in removing rare OTUs for your actual analysis. But, if your goal is visualization, then you should ask yourself what you’re trying to show. Is it a sample wide thing, system wide thing, etc? Then go from there in designing your visualization.