[llvm-dev] (RFC) JumpMaps: switch statement optimization

Friedman, Eli via llvm-dev llvm-dev at lists.llvm.org
Mon Feb 13 16:23:56 PST 2017

On 2/13/2017 5:11 AM, Witold Waligora via llvm-dev wrote:
> Hi All,
> This is our first appearance on llvm-dev so a small introduction is in
> order.
> We are a team of two working for good few years now on an out-of-tree
> embedded back-end. Our primary goal is code size, secondary performance.
> We've recently implemented an optimization that we believe others could
> use. It's about time to contribute back.
> -------------------------------------------------------
> JumpMaps: a generalization of JumpTables
> JumpTables produce fast and small code but have a limitation - they only
> work if case values are easy to compute from the variable being
> switch()ed. To overcome this limitation we introduce a {key,value}
> structure - a JumpMap.
> Simplifying somewhat, LLVM would currently generate a sequential
> if-elseif- for small switches and an inline binary-search for large
> switches, if it can't produce a JumpTable. JumpMaps would instead
> generate a call followed by a jump-through-register.
> Pseudo-code example:
> switch(x) {
>    case 0x6851: {BB1}
>    case 0x1383: {BB2}
>    case 0x2224: {BB3}
>    default:     {defaultBB}
> }
> Without Jump Maps:
>    if(x==0x6851) {
>      goto BB1
>    }
>    else if(x==0x1383) {
>      goto BB2
>    }
>    else if(x==0x2224){
>      goto BB3
>    }
>    else{
>      goto defaultBB
>    }
> With Jump Maps:
>    jumpmap_0 = {
>      keys = {3, 0x6851, 0x1383, 0x2224}
>      vals = {defaultBB, BB1, BB2, BB3}
>    }
>    addr dst = __jumpmap_find(&jumpmap_0, x)
>    goto dst;
> On our target Jump Maps produce both smaller and faster code even for
> quite small switch statements. We believe other architectures would
> benefit as well, X86 and ARM included:
> - jumpmap struct has a good chance of being cached more efficiently than
> a large chunk of inline binary-search code
> - for large switches branch prediction should have an easier time
> compared to inline binary-search
> - embedded architectures (ARM, MSP430) would benefit from smaller code
> (trading off .text for .rodata)
> In terms of lowering, JumpMap flow is very similar to that of
> JumpTables. The lowering is even slightly easier as JumpMaps do not
> require an extra range-check basic-block - the default case is also
> handled by __jumpmap_find.
> Targets need to provide lowering for the map structure and
> __jumpmap_find function, which would typically become a part of
> compier-rt library and could be written in hand-crafted asm for extra speed.
> Questions or comments?
> Please let me know.

Smaller codesize for switches could be interesting for Thumb targets.  
Some thoughts.

1. For Thumb, the overall size is more important that constants vs. 
code; the tables will end up in the text segment anyway.

2. How many versions of __jumpmap_find do you need?  If you're going for 
size, you probably want to put 1 or 2 byte relative jump offsets into 
the table, if possible.  You'd also want to shrink the keys if possible: 
for a small range, we could use 2-byte, or even 1-byte keys.  You could 
also encode deltas into the table, rather than absolute values; the 
deltas are likely to be smaller than the keys, and you're iterating over 
the table anyway.

3. Is the linear search a problem?  I mean, the linear nature of it 
doesn't matter for small tables, but eventually you get to a point where 
it just doesn't make sense.  Linearly iterating over 1000 entries in a 
table is clearly going to be slower than a binary search.  Or is the 
idea that you'd do a binary search down to say, 20 entries, then call 

4. How often does this pattern come up, in your experience?


Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

More information about the llvm-dev mailing list