[LLVMdev] Failure to optimize ? operator
Brent Walker
brenthwalker at gmail.com
Wed Dec 14 05:25:32 PST 2011
Apologies for the formatting but you get the point.
Neo
On Wed, Dec 14, 2011 at 10:19 PM, Brent Walker <brenthwalker at gmail.com> wrote:
> I don't understand your point. Which version is better does NOT
> depend on what inputs are passed to the function. The compiled code
> for (as per llvm) f1 will always take less time to execute than f2.
>
> for x > 0 => T(f1) < T(f2)
> for x <= 0 => T(f1) = T(f2)
>
> where T() is the time to execute the given function.
>
> So always T(f1) <= T(f2).
>
> I would call this a missed optimization opportunity. I think it
> warrants a bug report.
>
> If I do the same experiment with gcc I get identical code for the two functions:
>
> ==============================================
> _f1: pushl %ebp xorl %eax, %eax movl
> %esp, %ebp movl 8(%ebp), %edx testl %edx, %edx
> jle L5 popl %ebp ret .p2align 4,,7L5:
> movl %edx, %ecx imull %edx, %ecx popl %ebp
> leal 3(%ecx,%ecx,4), %eax imull %edx, %eax
> leal 1(%eax,%ecx,2), %eax ret .p2align 4,,15
> _f2:
> pushl %ebp xorl %eax, %eax movl %esp,
> %ebp movl 8(%ebp), %edx testl %edx, %edx jle
> L9 popl %ebp ret .p2align 4,,7L9:
> movl %edx, %ecx imull %edx, %ecx popl %ebp
> leal 3(%ecx,%ecx,4), %eax imull %edx, %eax leal
> 1(%eax,%ecx,2), %eax
> ret==============================================
>
> Brent
>
>
> On Wed, Dec 14, 2011 at 9:58 AM, Eli Friedman <eli.friedman at gmail.com> wrote:
>> On Tue, Dec 13, 2011 at 5:59 AM, Brent Walker <brenthwalker at gmail.com> wrote:
>>> The following seemingly identical functions, get compiled to quite
>>> different machine code. The first is correctly optimized (the
>>> computation of var y is nicely moved into the else branch of the "if"
>>> statement), which the second one is not (the full computation of var y
>>> is always done). The output was produced using the demo page on
>>> llvm's web site (optimization level LTO).
>>>
>>> Can someone shed some light on this strange behavior?
>>
>> The IR you're seeing is the IR we naturally generate for the given IR,
>> and LLVM doesn't have an optimization to turn your f2 into f1. Which
>> version is better depends on the values passed into the function,
>> which makes it a bit tricky to write an optimization to turn one into
>> the other.
>>
>> -Eli
More information about the llvm-dev
mailing list