[PATCH] [PGO] Hoist hot case statement from switches

Mon Mar 30 21:20:19 PDT 2015

In http://reviews.llvm.org/D5786#149403, @hans wrote:

> In http://reviews.llvm.org/D5786#149384, @silvas wrote:
>
> > In http://reviews.llvm.org/D5786#149001, @hans wrote:
> >
> > > The direction I'd like to explore with my patch however, is to balance the binary tree based on profile info rather than node count. That would put hotter cases closer to the root. If f(x) is the number of branches needed to reach case x, I think this approach would minimize the expected value of f(x) when x is a random variable with a distribution matching the profile info.
> >
> >
> > Sort of random, but in case it saves you a bunch of time, I believe this exact problem is covered in Knuth (either volume 1 or 3, I forget). The keyword is "optimal binary search tree".
>
>
> Thanks! It sounds like what I'm proposing is this approach: http://en.wikipedia.org/wiki/Optimal_binary_search_tree#Mehlhorn.27s_approximation_algorithm
>
> I guess it was bold of me to claim my idea guarantees the minimum expected value, but this makes me think it's still a good idea.

Ultimately it is based on your model for performance.

As a possibly realistic example, we may not want to optimize expected tree depth of the binary tree, but rather want to optimize predictability of the branches along each path from the root to the pieces of code in the leaves (weighted appropriately etc.). In this case, what we want to infer from the data is not "how likely are we to go to this case", but rather "how likely is a particular choice of branching scheme to result in well-predicted branches". In this case the choice of tree structure could be something related to minimizing entropy of each branch (as a rough approximation to branch prediction cost).

It's all about your model for the resulting performance. It seems like your current approach is based on a performance model where the tree depth is the sole determining factor in the resulting performance once we are doing a binary search tree (i.e. branches take constant time). That sounds reasonable, but might be worth revisiting in light of modern uarch's. Perhaps the entropy of each branch should be taken into account. Or maybe there is something fancier than entropy we can use here (such as a markov model; this would require improving the profiling tooling I think, so that they can collect conditional probabilities).

The extraction of switch tables and bit tests are examples where we are modeling the performance of handling particular situations.

Ok... done braindumping now...

http://reviews.llvm.org/D5786

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/