[LLVMdev] LLVM misses some cross-MBB and loop optimizations compared to GCC

Fri Feb 6 00:43:52 PST 2009

Done.

Please check these Bugzilla entries:

http://llvm.org/bugs/show_bug.cgi?id=3495 (LocalSpiller problems)

http://llvm.org/bugs/show_bug.cgi?id=3496 (Loop optimization problems)

-Roman

2009/2/6 Evan Cheng <echeng at apple.com>:
> Thanks. Can you file bugzilla reports? I'll look at the first one soon.
>
> Evan
> On Feb 5, 2009, at 8:08 AM, Roman Levenstein wrote:
>
>> Hi,
>>
>> While testing my new register allocators on some test-cases, I've
>> noticed that LLVM misses sometimes some optimization opportunities:
>>
>> 1) LocalSpiller::RewriteMBB seems not to propagate the information
>> about e.g. Spills between MBBs.In many cases, where MBB B1 has only
>> one predecessor MBB B2, B1 could reuse the information about the
>> physical registers that are in the live-out set of B2. This could help
>> to e.g. eliminate some useless reloads from spill slots, if the value
>> is available on the required physical register already. For example,
>> in the example below, the marked "movl    12(%esp), %ecx" instruction
>> could be eliminated.
>>
>> .LBB2_2:        # bb31
>>       movl    12(%esp), %ecx
>>       movl    8(%esp), %eax
>>       cmpl    $0, up+28(%eax,%ecx,4)
>>       je      .LBB2_9 # bb569
>> .LBB2_3:        # bb41         ; <--- bb31 is the only predecessor of bb41
>>       movl    12(%esp), %ecx ; <--- This could be eliminated!!!
>>       movl    4(%esp), %eax
>>       cmpl    $0, down(%eax,%ecx,4)
>>       je      .LBB2_9 # bb569
>>
>>
>> It is also worth mentioning, that currently reloads from spill slots
>> are not recorded in the Spills set using the addAvailable method, as
>> far as I can see. Wouldn't it make sense?
>>
>> I have the feeling that  these improvements are rather easy to achieve
>> and would not require too much changes to the LocalSpiller. Probably,
>> we just need to keep the live-out set of the MBB around after
>> rewriting it, so that its successors can use it in some cases as
>> initial value for the Spills set.
>>
>> Any opinions?
>>
>> 2) Moving of sub-expressions from loops and replacement of array
>> accesses via pointer-based induction variables is also not optimal in
>> some situations.
>>  In the example mentioned above, both blocks are executed inside a
>> loop enclosing them. And they keep evaluating  e.g. the
>> down(%eax,%ecx,4) expression on every iteration. GCC at the same time
>> hoists this expression outside of the loop and replaces it with a
>> simple pointer, as you can see below:
>>
>> .LBB2_2:
>>       movl    -32(%ebp), %edx
>>       movl    28(%edx), %eax
>>       testl   %eax, %eax
>>       je      .L5
>>
>> .LBB2_3:
>>       movl    -48(%ebp), %eax
>>       movl    (%eax), %edi
>>       testl   %edi, %edi
>>       je      .L5
>>
>>
>> To make it possible for you to analyze this test-case, I attach the
>> source file, the BC file and the output of the code produced by LLVM
>> and by  "GCC -O6".
>>
>> -Roman
>> <8q_speed.c.s><8q_speed.s.gcc><8q_speed.c.bc><8q_speed.c>
>
>