[llvm-commits] [Patch] Replace switches with lookup tables (PR884)

Duncan Sands baldrick at free.fr
Sat Sep 8 22:46:13 PDT 2012


Hi Evan,

> Thanks for doing this. I also agree this seems like a very valuable optimization.
>
> However, I'm slightly concerned with "where" this is done. I can think of exotic targets where a lookup table is infeasible (e.g. certain GPUs?) So this falls in the category of optimizations that need target information. In that case, we would be better off doing this late, i.e. codegenprep.

can't such targets turn it back into a switch?  An advantage of doing it at the
IR level is the knock-on effect of enabling other IR optimizations to kick in.

Ciao, Duncan.

>
> Evan
>
> On Sep 7, 2012, at 7:46 AM, Hans Wennborg <hans at chromium.org> wrote:
>
>> On Fri, Sep 7, 2012 at 11:23 AM, Hans Wennborg <hans at chromium.org> wrote:
>>> I've tried compiling Clang using Clang built from r163299 and r163305,
>>> i.e. right before and after my change, respectively. These are the
>>> numbers:
>>>
>>> Using Clang built from r163299 (right before my change):
>>> -rwxr-x--- 1 hwennborg eng 32399524 2012-09-07 10:54 Release/bin/clang
>>>    text    data     bss     dec     hex filename
>>>    27209071     1282264   47504 28538839        1b377d7 Release/bin/clang
>>>
>>> Using Clang built with r163305 (includes my change):
>>> -rwxr-x--- 1 hwennborg eng 31837890 2012-09-07 10:48 Release/bin/clang
>>>    text    data     bss     dec     hex filename
>>>    26642047     1287144   47504 27976695        1aae3f7 Release/bin/clang
>>>
>>> That's a 1.7% or half a megabyte reduction in binary size, which is
>>> more than I expected. I counted 426 replaced switches, though not all
>>> of them would have ended up in the clang binary.
>>
>> To follow up, I've been trying to figure out where that 500k reduction
>> comes from. One of the big contributors is PPCMCCodeEmitter.o with
>> almost 150 kB.
>>
>> The switch that's transformed to a lookup table is in
>> getPPCRegisterNumbering from PPCBaseInfo.h.
>>
>> Now, that switch is fairly small, covering a range of 176 cases.
>> However, it gets inlined a lot. Specifically, it gets inlined into
>> PPCMCCodeEmitter::getBinaryCodeForInstr about 130 times.
>>
>> That's where the code exploded: we inlined 32 blocks of code + 176
>> element jump table for them at 130 call sites.
>>
>> After my patch, we just inline a comparison, branch, and load. I guess
>> that explains the new code size for this file :) I suspect the other
>> major size reductions are for similar reasons.
>>
>> - Hans
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>




More information about the llvm-commits mailing list