Very high Fs values when running Amova


I am running Amova and Homova to compare populations. My datasets typically looks like this: a fastafile with about 1600 sequences and a designfile where the 1600 seq are grouped in 2 or 3 groups.

The values I get is approximately like this for all runs:

1-5 Among Within Total
SS 0.15542 0.0103837 0.165804
df 1 128 129
MS 0.15542 8.11226e-05

Fs: 1915.87
p-value: <0.001*

HOMOVA BValue P-value SSwithin/(Ni-1)_values
1-5 0.100908 0.127 8.40106e-05 7.75253e-0

I did not expect the Fs values to be that high. I have read that Fst-values range from 0-1. But I see from an earlier post here on the forum that Fs values can be higher than 1. But is this Fs-value trustable? Is it anything I should look out for that can result in that high Fs-value?
It is equal number of seq in group 1 and group 5.

Thanks! Any help will be appreciated!

What you’re showing is that there is considerably more variation between 1 and 5 than there is within either 1 or 5. At this point, we generally just use AMOVA/HOMOVA to compare groups of samples to each other, not sets of sequences. Why don’t you think your two communities are significantly different? You should think of the Fs as a non-parametric Fs from an ANOVA and so the value would have to be >1 to be significant.

Hi again!

Thanx for your reply! But can you elaborate what you mean with “we generally just use AMOVA/HOMOVA to compare groups of samples to each other, not sets of sequences”? We have sets of sequences, but we treat them as different samples…

I have a couple of questions more:
We want to compare diversity within groups and between groups, within one sample. A second goal is to compare how different the diversity among the groups, i.e 1 and 5, is for several samples. We have, however, different number of sequences for the different groups and also for the different samples. We have therefore normalised our datasets and run this pipeline (x 100 cycles):

  1. Pick random sequences, number equal to the group with the lowest number (within one sample)
  2. Make fastafile + designfile
  3. unique.seqs
  4. summary.seqs
  5. dist.seqs
  6. homova
  7. amova

Based on this, is it possible to do a direct comparison of Fs-values from the different samples? We have normalised based on the groups within one sample, but the samples are not normalised to each other. So far we believe we can take the Fs value based on among 1 and 5 analysis in sample A and compare it to Fs-value from the among 1-5 in sample B. However sample A and B might have large differences in the number of seq. that their analyses is based on. Is this a correct approach?

Here is an example of the output and values:
Group type Min Mean Median Stddev Max
1-5 msWithin 0.0 0.0 0.0 0.0 0.0
1-5 fs 2363.38 2393.722 2392.15 25.158 2640.34
1-5 group 1-5
1-5 msAmong 0.216 0.217 0.217 0.0 0.218
1-5 ssTotal 0.232 0.233 0.233 0.0 0.233
1-5 dfWithin 176.0 177.92 178.0 0.306 178.0
1-5 ssAmong 0.216 0.217 0.217 0.0 0.218
1-5 dfAmong 1.0 1.0 1.0 0.0 1.0
1-5 ssWithin 0.015 0.016 0.016 0.0 0.016
1-5 pvalue 0.001 0.001 0.001 0.0 0.001
1-5 dfTotal 177.0 178.92 179.0 0.306 179.0

And last, I have a question related to the amova formula. On mothur webpages is says that n is the number of samples per treatment (numSamples/numTreatments), a is the number of treatments, N is the number of samples. What is samples here and what is treatments? Is samples the number of sequences, and treatments is the number of groups (in my case 1 and 5)? Is this correct?


What I’m suggesting is…

  1. dist.seqs
  2. cluster
  3. make.shared
  4. dist.shared
  5. homova (on output from diet.shared)
  6. amova (on output from diet.shared)