[llvm-dev] (RFC) JumpMaps: switch statement optimization
Friedman, Eli via llvm-dev
llvm-dev at lists.llvm.org
Mon Feb 13 16:23:56 PST 2017
On 2/13/2017 5:11 AM, Witold Waligora via llvm-dev wrote:
> Hi All,
>
> This is our first appearance on llvm-dev so a small introduction is in
> order.
> We are a team of two working for good few years now on an out-of-tree
> embedded back-end. Our primary goal is code size, secondary performance.
> We've recently implemented an optimization that we believe others could
> use. It's about time to contribute back.
>
> -------------------------------------------------------
> JumpMaps: a generalization of JumpTables
> JumpTables produce fast and small code but have a limitation - they only
> work if case values are easy to compute from the variable being
> switch()ed. To overcome this limitation we introduce a {key,value}
> structure - a JumpMap.
>
> Simplifying somewhat, LLVM would currently generate a sequential
> if-elseif- for small switches and an inline binary-search for large
> switches, if it can't produce a JumpTable. JumpMaps would instead
> generate a call followed by a jump-through-register.
> Pseudo-code example:
>
> switch(x) {
> case 0x6851: {BB1}
> case 0x1383: {BB2}
> case 0x2224: {BB3}
> default: {defaultBB}
> }
>
> Without Jump Maps:
> if(x==0x6851) {
> goto BB1
> }
> else if(x==0x1383) {
> goto BB2
> }
> else if(x==0x2224){
> goto BB3
> }
> else{
> goto defaultBB
> }
>
> With Jump Maps:
> jumpmap_0 = {
> keys = {3, 0x6851, 0x1383, 0x2224}
> vals = {defaultBB, BB1, BB2, BB3}
> }
> addr dst = __jumpmap_find(&jumpmap_0, x)
> goto dst;
>
> On our target Jump Maps produce both smaller and faster code even for
> quite small switch statements. We believe other architectures would
> benefit as well, X86 and ARM included:
> - jumpmap struct has a good chance of being cached more efficiently than
> a large chunk of inline binary-search code
> - for large switches branch prediction should have an easier time
> compared to inline binary-search
> - embedded architectures (ARM, MSP430) would benefit from smaller code
> (trading off .text for .rodata)
>
> In terms of lowering, JumpMap flow is very similar to that of
> JumpTables. The lowering is even slightly easier as JumpMaps do not
> require an extra range-check basic-block - the default case is also
> handled by __jumpmap_find.
> Targets need to provide lowering for the map structure and
> __jumpmap_find function, which would typically become a part of
> compier-rt library and could be written in hand-crafted asm for extra speed.
>
>
> Questions or comments?
> Please let me know.
>
Smaller codesize for switches could be interesting for Thumb targets.
Some thoughts.
1. For Thumb, the overall size is more important that constants vs.
code; the tables will end up in the text segment anyway.
2. How many versions of __jumpmap_find do you need? If you're going for
size, you probably want to put 1 or 2 byte relative jump offsets into
the table, if possible. You'd also want to shrink the keys if possible:
for a small range, we could use 2-byte, or even 1-byte keys. You could
also encode deltas into the table, rather than absolute values; the
deltas are likely to be smaller than the keys, and you're iterating over
the table anyway.
3. Is the linear search a problem? I mean, the linear nature of it
doesn't matter for small tables, but eventually you get to a point where
it just doesn't make sense. Linearly iterating over 1000 entries in a
table is clearly going to be slower than a binary search. Or is the
idea that you'd do a binary search down to say, 20 entries, then call
__jumpmap_find?
4. How often does this pattern come up, in your experience?
-Eli
--
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
More information about the llvm-dev
mailing list