[PATCH] Expand SimplifyCFG to convert certain simple switches to selects

Thu Jun 19 19:27:45 PDT 2014

I made some micro benchmarks and the results are promising if the backend does a good job with select vs lookup table or branches + phi nodes.

For example on ARM this function:

int foo3_without_def(int a) {
  a &= 0x6;
  switch (a) {
    case 0:
      return 1;
    case 2:
      return 2;
    case 4:
      return 1;
    case 6:
      return 2;
    default:
      return 10;
  }
}

was translated as:

_foo3_without_def_norm:
	movw	r1, :lower16:(l_switch.table1-(LPC1_0+8))
	and	r0, r0, #6
	movt	r1, :upper16:(l_switch.table1-(LPC1_0+8))
LPC1_0:
	add	r1, pc, r1
	ldr	r0, [r1, r0, lsl #2]
	bx	lr

while after the optimization is:

_foo3_without_def:
	ubfx	r0, r0, #1, #1
	add	r0, r0, #1
	bx	lr

Which is much faster.

On X86 I got mixed results with the select version being basically the same performance. After looking at the resulting code I noticed some strange code from the X86 backend.
The same function above without the optimization is translated as:
_foo3_without_def_norm:
	andl	$6, %edi
	leaq	l_switch.table1(%rip), %rax
	movl	(%rax,%rdi,4), %eax
	retq

with the optimization:

_foo3_without_def:
	pushq	%rbp
	movq	%rsp, %rbp
	shrl	%edi
	andl	$1, %edi
	leal	1(%rdi), %eax
	popq	%rbp
	retq

As you can see the backend emits some (I think) unnecessary pushing/moving and popping of RBP, which takes away the advantage of the select with respect to the lookup table. 
I don't really know why the X86 backend emits those to be honest, but I'm pretty sure they aren't needed and that everything could be done with 3 instruction. In this way this code could be made as fast as the ARM version with respect to the lookup table code.

http://reviews.llvm.org/D4219