My rrNA sequence dataset has 4 independent replications x 2 locations x 3 seasons x 2 host plant species. But number of sequences for each sample varied between 500 to 7000 sequences. UNIFRAC analysis showed biologically meaningful clustering.But I am wondering, whether my analysis is right or not.
Do I have to extract a subset of equal number of sequences for each sample and perform similar analysis?
I would suggest running sub.sample to 500 sequence and repeating. Not only will things get thrown off by a >10-fold difference in sampling effort, but you will also have a >10-fold difference in the number of erroneous sequences that show up in your datasets. In general, when you are comparing communities, by any metric, you want them to have the same number of sequences.
Thanks very much for your suggestion. I made sure that I am dealing with equal number of sequences per sample. The output looks more meaningful.
Similar to your data, I have sequencing data for 3 independent replicates each for 2 treatments. So, basically I have 6 samples in group file. I want to compare both treatments as whole (not the individual replicates, say samples A,B,C with D,E,F). May I know how you gave Unifrac command to compare treatments?
I suppose you could merge replicates (A with B with C) and create a new group file with 2 groups (ABC, DEF), then repeat analysis.