[llvm-dev] (RFC) JumpMaps: switch statement optimization

Mon Feb 13 16:23:56 PST 2017

On 2/13/2017 5:11 AM, Witold Waligora via llvm-dev wrote:
> Hi All,
>
> This is our first appearance on llvm-dev so a small introduction is in
> order.
> We are a team of two working for good few years now on an out-of-tree
> embedded back-end. Our primary goal is code size, secondary performance.
> We've recently implemented an optimization that we believe others could
> use. It's about time to contribute back.
>
> -------------------------------------------------------
> JumpMaps: a generalization of JumpTables
> JumpTables produce fast and small code but have a limitation - they only
> work if case values are easy to compute from the variable being
> switch()ed. To overcome this limitation we introduce a {key,value}
> structure - a JumpMap.
>
> Simplifying somewhat, LLVM would currently generate a sequential
> if-elseif- for small switches and an inline binary-search for large
> switches, if it can't produce a JumpTable. JumpMaps would instead
> generate a call followed by a jump-through-register.
> Pseudo-code example:
>
> switch(x) {
>    case 0x6851: {BB1}
>    case 0x1383: {BB2}
>    case 0x2224: {BB3}
>    default:     {defaultBB}
> }
>
> Without Jump Maps:
>    if(x==0x6851) {
>      goto BB1
>    }
>    else if(x==0x1383) {
>      goto BB2
>    }
>    else if(x==0x2224){
>      goto BB3
>    }
>    else{
>      goto defaultBB
>    }
>
> With Jump Maps:
>    jumpmap_0 = {
>      keys = {3, 0x6851, 0x1383, 0x2224}
>      vals = {defaultBB, BB1, BB2, BB3}
>    }
>    addr dst = __jumpmap_find(&jumpmap_0, x)
>    goto dst;
>
> On our target Jump Maps produce both smaller and faster code even for
> quite small switch statements. We believe other architectures would
> benefit as well, X86 and ARM included:
> - jumpmap struct has a good chance of being cached more efficiently than
> a large chunk of inline binary-search code
> - for large switches branch prediction should have an easier time
> compared to inline binary-search
> - embedded architectures (ARM, MSP430) would benefit from smaller code
> (trading off .text for .rodata)
>
> In terms of lowering, JumpMap flow is very similar to that of
> JumpTables. The lowering is even slightly easier as JumpMaps do not
> require an extra range-check basic-block - the default case is also
> handled by __jumpmap_find.
> Targets need to provide lowering for the map structure and
> __jumpmap_find function, which would typically become a part of
> compier-rt library and could be written in hand-crafted asm for extra speed.
>
>
> Questions or comments?
> Please let me know.
>

Smaller codesize for switches could be interesting for Thumb targets.  
Some thoughts.

1. For Thumb, the overall size is more important that constants vs. 
code; the tables will end up in the text segment anyway.

2. How many versions of __jumpmap_find do you need?  If you're going for 
size, you probably want to put 1 or 2 byte relative jump offsets into 
the table, if possible.  You'd also want to shrink the keys if possible: 
for a small range, we could use 2-byte, or even 1-byte keys.  You could 
also encode deltas into the table, rather than absolute values; the 
deltas are likely to be smaller than the keys, and you're iterating over 
the table anyway.

3. Is the linear search a problem?  I mean, the linear nature of it 
doesn't matter for small tables, but eventually you get to a point where 
it just doesn't make sense.  Linearly iterating over 1000 entries in a 
table is clearly going to be slower than a binary search.  Or is the 
idea that you'd do a binary search down to say, 20 entries, then call 
__jumpmap_find?

4. How often does this pattern come up, in your experience?

-Eli

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project