[LLVMdev] LLVM misses some cross-MBB and loop optimizations compared to GCC

Thu Feb 5 22:40:50 PST 2009

Thanks. Can you file bugzilla reports? I'll look at the first one soon.

Evan
On Feb 5, 2009, at 8:08 AM, Roman Levenstein wrote:

> Hi,
>
> While testing my new register allocators on some test-cases, I've
> noticed that LLVM misses sometimes some optimization opportunities:
>
> 1) LocalSpiller::RewriteMBB seems not to propagate the information
> about e.g. Spills between MBBs.In many cases, where MBB B1 has only
> one predecessor MBB B2, B1 could reuse the information about the
> physical registers that are in the live-out set of B2. This could help
> to e.g. eliminate some useless reloads from spill slots, if the value
> is available on the required physical register already. For example,
> in the example below, the marked "movl    12(%esp), %ecx" instruction
> could be eliminated.
>
> .LBB2_2:        # bb31
>        movl    12(%esp), %ecx
>        movl    8(%esp), %eax
>        cmpl    $0, up+28(%eax,%ecx,4)
>        je      .LBB2_9 # bb569
> .LBB2_3:        # bb41         ; <--- bb31 is the only predecessor  
> of bb41
>        movl    12(%esp), %ecx ; <--- This could be eliminated!!!
>        movl    4(%esp), %eax
>        cmpl    $0, down(%eax,%ecx,4)
>        je      .LBB2_9 # bb569
>
>
> It is also worth mentioning, that currently reloads from spill slots
> are not recorded in the Spills set using the addAvailable method, as
> far as I can see. Wouldn't it make sense?
>
> I have the feeling that  these improvements are rather easy to achieve
> and would not require too much changes to the LocalSpiller. Probably,
> we just need to keep the live-out set of the MBB around after
> rewriting it, so that its successors can use it in some cases as
> initial value for the Spills set.
>
> Any opinions?
>
> 2) Moving of sub-expressions from loops and replacement of array
> accesses via pointer-based induction variables is also not optimal in
> some situations.
>   In the example mentioned above, both blocks are executed inside a
> loop enclosing them. And they keep evaluating  e.g. the
> down(%eax,%ecx,4) expression on every iteration. GCC at the same time
> hoists this expression outside of the loop and replaces it with a
> simple pointer, as you can see below:
>
> .LBB2_2:
>        movl    -32(%ebp), %edx
>        movl    28(%edx), %eax
>        testl   %eax, %eax
>        je      .L5
>
> .LBB2_3:
>        movl    -48(%ebp), %eax
>        movl    (%eax), %edi
>        testl   %edi, %edi
>        je      .L5
>
>
> To make it possible for you to analyze this test-case, I attach the
> source file, the BC file and the output of the code produced by LLVM
> and by  "GCC -O6".
>
> -Roman
> <8q_speed.c.s><8q_speed.s.gcc><8q_speed.c.bc><8q_speed.c>