[cfe-dev] libc++ red-black tree: halving the branches
silvas at purdue.edu
Wed Apr 4 17:58:35 PDT 2012
I came up with a neat little trick (that may have already been considered)
that halves the number of conditionals in the venerable red-black tree
rebalancing operations. The key observation is that there is a big `if`
whose two branches are identical except that "left" and "right" are
Basically, you use pointer-to-members for each of `left` and `right`, and
then a single swap reflects the case. You can see this in this experimental
modification of the code for __tree_balance_after_insert <
http://pastie.org/3730128> when compared to the current version (it uses a
unified __tree_rotate that takes a pointer to member for what its notion of
"left" should be).
In my quick experiments I found that this halved the number of branches in
the generated code (iirc 26 to 13 or something like that). The code size
reduction wasn't as drastic, but it was still ~30% (I'm guessing the
optimizer usually is doing some heroics to fold some commonalities, but
still failing to recognize the fundamental sense of the commonality).
Correct me if I'm wrong, but the way that __tree_rebalance_after_insert and
friends are written means that only one copy of that routine should appear
in the final binary, so the code size savings is a fixed constant; (or is
the intent to have it get inlined into __tree<...>::__insert_node_at ?). I
don't have libcxx set up as my stdlib currently so I haven't been able to
compile a large program (e.g. LLVM/clang) with it and see how things go
down in that regard.
Possible cons off the top of my head:
- Increased register pressure. In the experimental version that I posted,
the left and right pointer-to-members each get put in their own register,
adding two registers of pressure. This can be improved by keeping just one
of them and using xor to switch sides as necessary.
- On architectures with only very simple addressing modes, this might
require extra instructions for each memory access.
I have yet to do any performance or holistic code-size measurements, but I
want to solicit feedback on the idea to see if it might be a path worth
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the cfe-dev