[PATCH] Expand SimplifyCFG to convert certain simple switches to selects
Marcello Maggioni
hayarms at gmail.com
Thu Jun 19 19:27:45 PDT 2014
I made some micro benchmarks and the results are promising if the backend does a good job with select vs lookup table or branches + phi nodes.
For example on ARM this function:
int foo3_without_def(int a) {
a &= 0x6;
switch (a) {
case 0:
return 1;
case 2:
return 2;
case 4:
return 1;
case 6:
return 2;
default:
return 10;
}
}
was translated as:
_foo3_without_def_norm:
movw r1, :lower16:(l_switch.table1-(LPC1_0+8))
and r0, r0, #6
movt r1, :upper16:(l_switch.table1-(LPC1_0+8))
LPC1_0:
add r1, pc, r1
ldr r0, [r1, r0, lsl #2]
bx lr
while after the optimization is:
_foo3_without_def:
ubfx r0, r0, #1, #1
add r0, r0, #1
bx lr
Which is much faster.
On X86 I got mixed results with the select version being basically the same performance. After looking at the resulting code I noticed some strange code from the X86 backend.
The same function above without the optimization is translated as:
_foo3_without_def_norm:
andl $6, %edi
leaq l_switch.table1(%rip), %rax
movl (%rax,%rdi,4), %eax
retq
with the optimization:
_foo3_without_def:
pushq %rbp
movq %rsp, %rbp
shrl %edi
andl $1, %edi
leal 1(%rdi), %eax
popq %rbp
retq
As you can see the backend emits some (I think) unnecessary pushing/moving and popping of RBP, which takes away the advantage of the select with respect to the lookup table.
I don't really know why the X86 backend emits those to be honest, but I'm pretty sure they aren't needed and that everything could be done with 3 instruction. In this way this code could be made as fast as the ARM version with respect to the lookup table code.
http://reviews.llvm.org/D4219
More information about the llvm-commits
mailing list