Heatmap puzzle

During a recent analysis with mothur we got a strange result for our heatmap and it would be great if someone could help us understand it and tell us if we encountered a bug or are just unable to properly interprete the heatmap (of which I don´t have much expertise).

In our analysis we compared the top 20 OTUs from three treatments and the number of sequences in the OTUs (as seen in the shared file used for creating the heatmap) do not seem to match the colouring in the heatmap. For example:

A B C
OTU005 61 451 449
OTU006 0 0 449
OTU007 0 0 419

In the heatmap treatment C shows for example a light red for OTUs 005 & 006 but black for the very similar number of sequences at OTU007. OTU005 at treatment B, which is again nearly identical to the numbers in C, shows also a much darker red than those.

We did this analysis on the Windows and Linux versions of mothur (both V1.32) with the following commands:

Starting from our processed and quality controled final fasta, names & groups files, we excluded some samples with remove.groups resulting in “pick” files. We also created a new groups file in which we combined samples that belong to the same treatment. After that we used:

classify.seqs(fasta=final.pick.fasta, template=SSURef111bacteria.fasta, taxonomy=SSURef111bacteria.tax)
cluster.split(fasta=final.pick.fasta, taxonomy=final.pick.SSURef111bacteria.wang.taxonomy, name=final.pick.names, processors=2, taxlevel=2)
make.shared(list=final.pick.an.list, group=FINAL1.groups)
heatmap.bin(shared=final.pick.an.shared, numotu=20)

The number of sequences in the 20 most abundant OTUs were taken from the shared file (final.pick.an.shared) at label 0.03 and (of course) compared to the heatmap of the same level.

Any help in understanding this output and putative discrepancy would be greatly appreciated.

René

By default, mothur uses a log10 scaler on the relative abundance calculation of the colors. Here is a small piece of the source:

      if (lookup[i]->getAbundance(j) != 0) { //don't want log value of 0.
     if (scaler == "log10") {
                        if (maxRelAbund[i] == 1) { maxRelAbund[i] += 0.001; }
      scaleRelAbund[i][j] = toHex(int(255 * log10(relAbund) / log10(maxRelAbund[i]))) + "0000";
     }else if (scaler == "log2") {
                        if (maxRelAbund[i] == 1) { maxRelAbund[i] += 0.001; }
      scaleRelAbund[i][j] = toHex(int(255 * log2(relAbund) / log2(maxRelAbund[i]))) + "0000";  
     }else if (scaler == "linear") {
      scaleRelAbund[i][j] = toHex(int(255 * relAbund / maxRelAbund[i])) + "0000";  
     }else {  //if user enters invalid scaler option.
                        if (maxRelAbund[i] == 1) { maxRelAbund[i] += 0.001; }
      scaleRelAbund[i][j] = toHex(int(255 * log10(relAbund / log10(maxRelAbund[i]))))  + "0000"; 
     } 
  }else { scaleRelAbund[i][j] = "FFFFFF";  }

If you want to see a linear representation of the heatmap, then run heatmap.bin(shared=yourSharedFile, scale=linear).

Hi Sarah,

thanks for your reply.

I tried the linear and log2 options of the command, but with the linear output everything but the first three OTU lines are pitch black while the log2 output does not seem to differ from the log10.

So you are saying that the colour differences come from the scaling, even when the numbers are very similar (451 versus 449 & 449 versus 419 in my example)? Is this an effect of the differences in abundance between the top three OTUs with sequence numbers of 3000-4000 versus the other 17 in the analysis with 400 and below? As I mentioned, I have not much experience with heatmaps, so I hope you forgive my stupid questions.

Cheers,

René

Hi Rene,
There are no stupid questions, :). I am happy to help. For all options mothur finds the maximum relative abundance for each group, and the relative abundance for each group in each OTU. In all cases the relative abundance is divided by the maximum relative abundance. The scalers add a layer of complexity. Let’s look at your example in concrete numbers:

label Group numOtus OTU1 OTU2 OTU3
0.03 A 3 61 0 0
0.03 B 3 451 0 0
0.03 C 3 449 449 419

group OTU relabund maxRelabund (relabund/maxrelabund)
A OTU1 1 1 1
B OTU1 1 1 1
C OTU1 0.340926 0.340926 1
C OTU2 0.340926 0.340926 1
C OTU3 0.318147 0.340926 0.933185

The entire picture is almost all red, with the exception of the 0 abundances which are white. If you are seeing black you are likely using an old version of mothur with a bug where the colors were reversed.

Now the log10 scale:

group OTU relabund maxRelabund (log10(relabund)/log10(maxrelabund))
A OTU1 1 1.001 0
B OTU1 1 1.001 0
C OTU1 0.340926 0.340926 1
C OTU2 0.340926 0.340926 1
C OTU3 0.318147 0.340926 1.06426

For groups A and B, because all your sequences are in one OTU the maxRelabund = 1 and the log10(1)=0. This is causing a bit of a weird result. I will bring this special case up with Pat.

Thanks again Sarah, that helped me understanding the problem.

I would be interesting too to know what Pat thinks of my “weird result” caused by the OTU distribution in my groups.

Hey!

I also have 8 libraries (sorry, still Sanger sequences) with few clones each (normalized dataset to a maximum of 96 seqs per sample) and they happen to be dominated by one to three OTUs and then a lot of 2-3 and unique OTUs. When I tried to build the heatmap, I have the same issue. Should I use the linear option? Is there any solution Pat proposed for this problem?

And one more question, only black and red colors can be used or there is a way to make the plots look more colorfull?

Thanks a lot!

Susana

It would probably be best to build the heat maps in R. They have far more resources available for making pretty heat maps and we really aren’t interested in reimplementing R.

Pat

Thank you Pat for your answer, I will try that.

But if I use the Mothur heatmap command, due to all the above explained about the distribution of the sequences among the OTUs defined, should I use linear, log2 or log10?

whatever looks best!

Thanks! :mrgreen: I decided linear and black and red, and it looks fine for me now :wink: