[llvm-dev] (RFC) JumpMaps: switch statement optimization

Nema, Ashutosh via llvm-dev llvm-dev at lists.llvm.org
Tue Feb 14 01:53:54 PST 2017

Hi Witold,

Can you explain what you meant by adding support in compiler_rt & changes in lowering.

Did you tried any benchmarks, please share if you have any numbers with performance improvements & code size reduction ?


-----Original Message-----
From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Witold Waligora via llvm-dev
Sent: Monday, February 13, 2017 6:41 PM
To: llvm-dev at lists.llvm.org
Subject: [llvm-dev] (RFC) JumpMaps: switch statement optimization

Hi All,

This is our first appearance on llvm-dev so a small introduction is in order.
We are a team of two working for good few years now on an out-of-tree embedded back-end. Our primary goal is code size, secondary performance.
We've recently implemented an optimization that we believe others could use. It's about time to contribute back.

JumpMaps: a generalization of JumpTables JumpTables produce fast and small code but have a limitation - they only work if case values are easy to compute from the variable being switch()ed. To overcome this limitation we introduce a {key,value} structure - a JumpMap.

Simplifying somewhat, LLVM would currently generate a sequential
if-elseif- for small switches and an inline binary-search for large switches, if it can't produce a JumpTable. JumpMaps would instead generate a call followed by a jump-through-register.
Pseudo-code example:

switch(x) {
  case 0x6851: {BB1}
  case 0x1383: {BB2}
  case 0x2224: {BB3}
  default:     {defaultBB}

Without Jump Maps:
  if(x==0x6851) {
    goto BB1
  else if(x==0x1383) {
    goto BB2
  else if(x==0x2224){
    goto BB3
    goto defaultBB

With Jump Maps:
  jumpmap_0 = {
    keys = {3, 0x6851, 0x1383, 0x2224}
    vals = {defaultBB, BB1, BB2, BB3}
  addr dst = __jumpmap_find(&jumpmap_0, x)
  goto dst;

On our target Jump Maps produce both smaller and faster code even for quite small switch statements. We believe other architectures would benefit as well, X86 and ARM included:
- jumpmap struct has a good chance of being cached more efficiently than a large chunk of inline binary-search code
- for large switches branch prediction should have an easier time compared to inline binary-search
- embedded architectures (ARM, MSP430) would benefit from smaller code (trading off .text for .rodata)

In terms of lowering, JumpMap flow is very similar to that of JumpTables. The lowering is even slightly easier as JumpMaps do not require an extra range-check basic-block - the default case is also handled by __jumpmap_find.
Targets need to provide lowering for the map structure and __jumpmap_find function, which would typically become a part of compier-rt library and could be written in hand-crafted asm for extra speed.

Questions or comments?
Please let me know.

Best Regards,
Witold Waligóra
Myre Laboratories
LLVM Developers mailing list
llvm-dev at lists.llvm.org

More information about the llvm-dev mailing list