[cfe-dev] libc++ red-black tree: halving the branches

Thu Apr 5 00:46:02 PDT 2012

On Thu, Apr 5, 2012 at 5:04 AM, Howard Hinnant <hhinnant at apple.com> wrote:

> On Apr 4, 2012, at 8:58 PM, Sean Silva wrote:
>
> > I came up with a neat little trick (that may have already been
> considered) that halves the number of conditionals in the venerable
> red-black tree rebalancing operations. The key observation is that there is
> a big `if` whose two branches are identical except that "left" and "right"
> are switched.
> >
> > Basically, you use pointer-to-members for each of `left` and `right`,
> and then a single swap reflects the case. You can see this in this
> experimental modification of the code for __tree_balance_after_insert <
> http://pastie.org/3730128> when compared to the current version (it uses
> a unified __tree_rotate that takes a pointer to member for what its notion
> of "left" should be).
>
> Very clever.  I believe this is good code to familiarize one's self with
> in order to add another good tool to the toolbox.
>
> > In my quick experiments I found that this halved the number of branches
> in the generated code (iirc 26 to 13 or something like that).
>
> In counting branches for performance comparisons (as opposed to code size
> comparisons) it is important to count the number of branches executed by
> each path through the routine, as opposed to the total number of branches.

Not only this, but also whether the change impacts the predictability of
the branches to the CPU's branch predictor. This type of change (unifying
code paths to share a set of branches) can have a really bad result of
making all of the branches between 'hard' and 'unpredictable' instead of
them being 'well predicted'. However, in some (more rare) cases it can have
the opposite effect! So, I would want to see very careful measurement of
performance as well. I think it has a small possibility of making no
difference, a small possibility of providing a significant performance
boost, and a larger possibility of providing a significant performance loss.

There is a good tool for this on linux: the perf events tool. It can
measure the number of branches executed in a benchmark program, and the %
which are unpredicted and mispredicted. The worst case here is
mispredicted, the almost-as-bad case is unpredicted.

The other complication is that different chips have different prediction
strategies, so the results my swing wildly between architectures.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20120405/c99497df/attachment.html>