[LLVMdev] Lowering switch statements with PGO

Tue Dec 23 08:26:42 PST 2014

> After messing around with this a bit recently, I've come to the following
> conclusions:
>
> 1. One issue is at what granularity to consider the PGO weightings. For
> example, should a set of 50 consecutive cases with 65% of the total weight
> distributed approximately evenly and which can be lowered to a table
> lookup
> be handled before the 3 remaining cases that 5%, 10%, and 20% probability,
> respectively?

It seems reasonable to consider the lookup table as the 'hottest' case
(per your next comment).

> 2. Rather than saying "hot case" to mean the case with the most weight, we
> might want to talk about a different "unit" in which to aggregate the
> weights (such as in 2., considering the whole lookup table as one).

I agree with this point.

> 3. The primary advantages of a tree approach I suggested are likely to
> surround being able to have a more "global" picture of the set of cases at
> each step in the lowering and do all the computation up front.
>
> In a larger scope, I'm convinced that in order to make appropriate
> tradeoffs, we probably need to analyze lots of real-world code.

Agreed.

> For example:
>
> - some use cases of `switch` are extremely performance-critical, like
> lexers/interpreters, while at the other end of the spectrum things like
> enum->string conversions for printing, are less-so (and should probably be
> optimized for size).
> - some uses of switches do very different things in each of their case
> destinations (and use fallthrough, which must be exploited properly by the
> lowering), while others have very similar code in their destinations (the
> entire switch might actually be describing a mapping on data, rather than
> a
> control flow construct per se; e.g. a table lookup (dense indices) or
> binary search in a static table (sparse indices) might be a great
> lowering).
> - some switches have multiple case values that map to each destination
> while others have a 1-1 mapping.
>
> Those are some ways that switch's can vary off the top of my head. I think
> it would be really valuable if somebody who has easy access to compiling a
> large amount of real-world code with a custom compiler and collect some
> data (my understanding is that Google's internal codebase is a good
> candidate).
>

(Sadly) My world consists of SPEC2K and SPEC2K6, so I don't have must to
offer here. :/

>
> -- Sean Silva
>
> On Tue, Dec 16, 2014 at 12:17 PM, Sean Silva <chisophugis at gmail.com>
> wrote:
>>
>>
>>
>> On Tue, Dec 16, 2014 at 7:23 AM, Chad Rosier <mcrosier at codeaurora.org>
>> wrote:
>>>
>>> > On Mon, Dec 15, 2014 at 2:26 PM, Chad Rosier
>>> <mcrosier at codeaurora.org>
>>> > wrote:
>>> >>
>>> >> > On Mon, Dec 15, 2014 at 9:57 AM, Chad Rosier <
>>> mcrosier at codeaurora.org>
>>> >> > wrote:
>>> >> >> All,
>>> >> >> About two months ago I posted a patch that hoisted the hottest
>>> case
>>> >> >> statement from a switch statement during ISelLowering.
>>> >> >>
>>> >> >> See: http://reviews.llvm.org/D5786
>>> >> >>
>>> >> >> Sean was rather adamant about using a Huffman tree (and I agree
>>> this
>>> >> is
>>> >> >> a
>>> >> >> more complete solution), so I'd like to put a patch together.
>>> >> >
>>> >> > I think this sounds really cool!
>>> >> >
>>> >> >> That being
>>> >> >> said, I have a few questions.
>>> >> >>
>>> >> >> The current switch lowering code sorts based on case values and
>>> is
>>> >> >> lowered
>>> >> >> with a combination of binary trees, lookup tables, bit tests and
>>> >> magic.
>>> >> >> If we lower the switch based on branch probabilities, I think the
>>> >> most
>>> >> >> reasonable approach would be to disable the lookup tables and
>>> stick
>>> >> with
>>> >> >> a
>>> >> >> purely tree structure.  Does anyone object or have opinions on
>>> this
>>> >> >> matter?
>>> >> >
>>> >> > Spontaneously I would have thought the Huffman tree would just
>>> replace
>>> >> > the binary trees, i.e. we'd use the branch probabilities to build
>>> the
>>> >> > tree differently.
>>> >>
>>> >> The current implementation selects the pivot in such a way that it
>>> >> optimizes
>>> >> for lookup tables.
>>> >
>>> >
>>> > This seems like a potential point of friction: for lookup tables (and
>>> > other
>>> > techniques) you want to inherently keep the cases sorted by their
>>> values,
>>> > but a Huffman tree does not care about the actual values; it only
>>> cares
>>> > about their relative probabilities.
>>>
>>> Exactly!  I can think of lots of ways to deal with this "friction", but
>>> none of them sit well.  As you mentioned, building a Huffman tree and
>>> then
>>> dealing with corner cases would be one approach.  Another, which was
>>> Owen's suggestion, would be to peel the hot cases and then fall-back to
>>> the current logic.
>>>
>>
>> Another possibility might be to use a hybrid approach.
>>
>> For example, while building the Huffman tree, propagate various
>> information up the tree as you build it. Then you can very easily start
>> at
>> the top of the huffman tree and have all this information available to
>> you
>> (e.g. "peel off the most probable cases until the relative probabilty
>> has
>> some relation to the subtree case density").
>>
>> On the other hand (more likely), you could keep the cases ordered (i.e.
>> not a Huffman tree) and integrate the branch probability info as the
>> extra
>> data propagated up the tree (along with other stuff); e.g. the max
>> probability of any node in the subtree, the total probability sum in the
>> subtree. This would also allow some highly non-trivial properties, like
>> "maximum number of consecutive cases observed in each subtree" to be
>> maintained, along with O(1) access to the root of the subtree containing
>> the consecutive cases (and if we keep the tree balanced, O(log n) time
>> to
>> extract the consecutive cases and update all data we are propagating up
>> the
>> tree). This would allow a really clean lowering algorithm like:
>>
>> while (we have subtrees with property X)
>>   extract and handle subtree in a particular way
>> while (we have subtrees with property Y)
>>   extract and handle subtree in a different way
>> etc.
>>
>> A prototype of this can be done really easily with a non-balanced binary
>> tree. In order to make it reliably fast the binary tree with need to be
>> upgraded to a balanced binary tree (red-black, AVL, treap, etc.), which
>> I
>> will go ahead and volunteer to do (I've actually spent more time than I
>> care to admit investigating balanced binary trees and their
>> implementation).
>>
>> For a textbook reference on maintaining extra data in a tree, see e.g.
>> section 14.2 of CLRS.
>>
>> -- Sean Silva
>>
>>
>>>
>>> > -- Sean Silva
>>> >
>>> >
>>> >>   Lowering the binary tree based on branch probabilities
>>> >> doesn't fit well with this approach.  However, I think we could
>>> start
>>> by
>>> >> building a Huffman tree and once we've handled the most heavily
>>> weighted
>>> >> nodes,
>>> >> we could fall back to the current logic.  Make sense?
>>> >>
>>> >> > For how many hot cases is a Huffman tree faster than a jump table?
>>> I
>>> >> > suspect we'll still want to build jump tables when we can.
>>> >>
>>> >> I take back my previous comment; I agree we should have a hybrid
>>> >> approach,
>>> >> per my comments above.
>>> >>
>>> >> > Thanks,
>>> >> > Hans
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> _______________________________________________
>>> >> LLVM Developers mailing list
>>> >> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>> >>
>>> >
>>>
>>>
>>>
>