Selecting sequences for unifrac analysis.

dwaite · September 13, 2011, 9:39am

Hi,

I’m working on some 16S sequence data, obtained from several sources and am using mothur for doing my statistical analysis. I’ve followed through the esophageal community analysis example and have everything working fine.

I just had a question regarding the unifrac command. Some of the sequence data I’m using came from an earlier study and has a significantly smaller number of sequences to work with. I’m aware that unifrac can be influenced by differences in sample numbers so I was wondering about the best way to select a subset of the larger sequence pool to match to this smaller set.

When I compare two of the groups in my data with an unweighted unifrac I get a p-value of 0.04, which is just a little bit too high to accept a significant difference (I would need a p-value of less than 0.0167), but since it’s coming quite close to the threshold I want to be sure that the groups aren’t being pushed apart by a bias in the methodology.

The way I currently see it is that my options are to either manually select what I consider to be a representative sample of each pool so that I end up with the same number of sequences in each group, or to just randomly take a subset of each group(and probably repeat the process a few times to try and avoid any skews). To take that even further, I could make multiple subsets of the larger sequence pool and then compare them to test for significant differences caused by the selection process.

Any advice or advancements on what I’ve suggested would be great, thanks.

pschloss · October 4, 2011, 12:38pm

You might try out the sub.sample command which is in the wiki. This command will allow you to specify the number of sequences in each group. You can go from there with your analysis.

Topic		Replies	Views
Unifrac.weighted analysis problem Commands in mothur	2	426	January 21, 2021
variable number of sequences _ betadiversity measures Theory behind mothur	4	4603	January 10, 2012
unifrac not all tree sequences in group file Commands in mothur	1	2149	June 17, 2013
Rarefaction or sub-sampling? Commands in mothur	19	5631	May 21, 2020
calculating unifrac distance between subsets of communities Integrating mothur with other programs	2	3503	October 20, 2015

Selecting sequences for unifrac analysis.

Related topics