[llvm-dev] (RFC) JumpMaps: switch statement optimization

Witold Waligora via llvm-dev llvm-dev at lists.llvm.org
Mon Feb 13 05:11:24 PST 2017

Hi All,

This is our first appearance on llvm-dev so a small introduction is in
We are a team of two working for good few years now on an out-of-tree
embedded back-end. Our primary goal is code size, secondary performance.
We've recently implemented an optimization that we believe others could
use. It's about time to contribute back.

JumpMaps: a generalization of JumpTables
JumpTables produce fast and small code but have a limitation - they only
work if case values are easy to compute from the variable being
switch()ed. To overcome this limitation we introduce a {key,value}
structure - a JumpMap.

Simplifying somewhat, LLVM would currently generate a sequential
if-elseif- for small switches and an inline binary-search for large
switches, if it can't produce a JumpTable. JumpMaps would instead
generate a call followed by a jump-through-register.
Pseudo-code example:

switch(x) {
  case 0x6851: {BB1}
  case 0x1383: {BB2}
  case 0x2224: {BB3}
  default:     {defaultBB}

Without Jump Maps:
  if(x==0x6851) {
    goto BB1
  else if(x==0x1383) {
    goto BB2
  else if(x==0x2224){
    goto BB3
    goto defaultBB

With Jump Maps:
  jumpmap_0 = {
    keys = {3, 0x6851, 0x1383, 0x2224}
    vals = {defaultBB, BB1, BB2, BB3}
  addr dst = __jumpmap_find(&jumpmap_0, x)
  goto dst;

On our target Jump Maps produce both smaller and faster code even for
quite small switch statements. We believe other architectures would
benefit as well, X86 and ARM included:
- jumpmap struct has a good chance of being cached more efficiently than
a large chunk of inline binary-search code
- for large switches branch prediction should have an easier time
compared to inline binary-search
- embedded architectures (ARM, MSP430) would benefit from smaller code
(trading off .text for .rodata)

In terms of lowering, JumpMap flow is very similar to that of
JumpTables. The lowering is even slightly easier as JumpMaps do not
require an extra range-check basic-block - the default case is also
handled by __jumpmap_find.
Targets need to provide lowering for the map structure and
__jumpmap_find function, which would typically become a part of
compier-rt library and could be written in hand-crafted asm for extra speed.

Questions or comments?
Please let me know.

Best Regards,
Witold Waligóra
Myre Laboratories

More information about the llvm-dev mailing list