[LLVMdev] Failure to optimize ? operator

Wed Dec 14 05:25:32 PST 2011

Apologies for the formatting but you get the point.

Neo

On Wed, Dec 14, 2011 at 10:19 PM, Brent Walker <brenthwalker at gmail.com> wrote:
> I don't understand your point.  Which version is better does NOT
> depend on what inputs are passed to the function.  The compiled code
> for (as per llvm) f1 will always take less time to execute than f2.
>
> for x > 0 => T(f1) < T(f2)
> for x <= 0 => T(f1) = T(f2)
>
> where T() is the time to execute the given function.
>
> So always T(f1) <= T(f2).
>
> I would call this a missed optimization opportunity.  I think it
> warrants a bug report.
>
> If I do the same experiment with gcc I get identical code for the two functions:
>
> ==============================================
> _f1:        pushl   %ebp        xorl    %eax, %eax        movl
> %esp, %ebp        movl    8(%ebp), %edx        testl   %edx, %edx
>   jle     L5        popl    %ebp        ret        .p2align 4,,7L5:
>     movl    %edx, %ecx        imull   %edx, %ecx        popl    %ebp
>      leal    3(%ecx,%ecx,4), %eax        imull   %edx, %eax
> leal    1(%eax,%ecx,2), %eax        ret        .p2align 4,,15
> _f2:
>         pushl   %ebp        xorl    %eax, %eax        movl    %esp,
> %ebp        movl    8(%ebp), %edx        testl   %edx, %edx        jle
>     L9        popl    %ebp        ret        .p2align 4,,7L9:
> movl    %edx, %ecx        imull   %edx, %ecx        popl    %ebp
>  leal    3(%ecx,%ecx,4), %eax        imull   %edx, %eax        leal
> 1(%eax,%ecx,2), %eax
> ret==============================================
>
> Brent
>
>
> On Wed, Dec 14, 2011 at 9:58 AM, Eli Friedman <eli.friedman at gmail.com> wrote:
>> On Tue, Dec 13, 2011 at 5:59 AM, Brent Walker <brenthwalker at gmail.com> wrote:
>>> The following seemingly identical functions, get compiled to quite
>>> different machine code.  The first is correctly optimized (the
>>> computation of var y is nicely moved into the else branch of the "if"
>>> statement), which the second one is not (the full computation of var y
>>> is always done).  The output was produced using the demo page on
>>> llvm's web site (optimization level LTO).
>>>
>>> Can someone shed some light on this strange behavior?
>>
>> The IR you're seeing is the IR we naturally generate for the given IR,
>> and LLVM doesn't have an optimization to turn your f2 into f1.  Which
>> version is better depends on the values passed into the function,
>> which makes it a bit tricky to write an optimization to turn one into
>> the other.
>>
>> -Eli