[LLVMdev] Possible missed optimization?

Sat Sep 4 12:26:59 PDT 2010

Could it be that an optimization was actually missed?
It seems that there is some sort of look-ahead optimization is
happening because the constant value was moved into r14, when the
optimal choice was not to move r14 and keep the variable into r14. All
these R14 with different values is confusing. My mental SSA form looks
like:

t1 = b_high ^ a_low
t2 = b_low ^ a_low
t3 = 18
t4 = t2 ^ t3
5: return t1, t4

Now, when the registers are allocated, the t4 attempts to use a
register already used by either t2 or t3. However a constant value is
in t3. The allocator could attempt to re-use t3 instead of t2 because
it doesn't know about t3 being a constant. Thus it inserts a move on
t2 and re-uses t3. At the allocator standpoint consider:

1: t1 = b_high ^ a_high ; b_high and a_high are dead and t1 may be any
b_high or b_low.
2: t2 = b_low ^ a_low   ; b_low and a_low are also dead. t2 may be either.
3: t3 = 18                    ; 18 may be any register
4: t4 = t2 ^ t3               ; t4 may reuse t2 or t3.
5: return t1, t4

The optimal value at (1) is of course R14 or a_high's old value. This
is a simple choice. The other choice of t4 is a bit harder because it
may be t2 or t3. At first glance a large enumerative tree search would
be exponential. However, I don't know much of the theory behind
reusing registers like this.

Thanks,
Jeff Kunkel

On Sat, Sep 4, 2010 at 12:54 PM, Borja Ferrer <borja.ferav at gmail.com> wrote:
>
> Hello, while testing trivial functions in my backend i noticed a suboptimal way of assigning regs that had the following pattern, consider the following function:
>
> typedef unsigned short t;
> t foo(t a, t b)
> {
>     t a4 = b^a^18;
>     return a4;
> }
>
> Argument "a" is passed in R15:R14 and argument "b" is passed in R13:R12, the return value is stored in R15:R14.
>
> Producing the following asm code:
>     xor    r15, r13  ; xor top part
>     mov    r8, r14
>     xor    r8, r12   ; xor bottom part
>     movi    r14, 18
>     xor    r14, r8  ; xor bottom part with imm value
>
> However it could have produced:
>     xor     r15, r13
>     xor     r14, r12
>     movi   r8, 18
>     xor     r14, r8
>
> This saves the first move instruction by using any scratch register when moving the imm value instead of using r14 which is used as an incoming reg and as the return value. Is this a missed optimization from LLVM or did i miss something out?
> Changing the register allocation order didnt work.
>
> Thanks
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>