[LLVMdev] Possible missed optimization?

Sat Mar 26 16:09:41 PDT 2011

>
> You can look at the output of -debug-only=regcoalescing to see what is
> going on.
>
> This is the debug output i've got, some information is a bit cryptic for me
so next is what i understood:

********** SIMPLE REGISTER COALESCING **********
********** Function: foo
********** JOINING INTERVALS ***********
entry:
16L    %vreg0<def> = COPY %R25R24<kill>; DREGS:%vreg0
    Considering merging %vreg0 with physreg %R25R24
        RHS = %vreg0 = [16d,96d:0)  0 at 16d
        LHS = %R25R24,inf = [0L,16d:0)  0 at 0L-phidef
        updated: 96L    %vreg8<def> = COPY %R25R24<kill>; PTRREGS:%vreg8
        updated: 32L    %vreg5<def> = COPY %R25R24; PTRREGS:%vreg5
    Joined. Result = %R25R24,inf = [0L,96d:0)  0 at 0L-phidef
32L    %vreg5<def> = COPY %R25R24; PTRREGS:%vreg5
    Not coalescable.
64L    %vreg6<def> = COPY %vreg4<kill>; DLDREGS:%vreg6,%vreg4
    Considering merging %vreg4 with %vreg6 to DLDREGS
        RHS = %vreg4 = [48d,64d:0)  0 at 48d
        LHS = %vreg6 = [64d,80d:1)[80d,112d:0)  0 at 80d 1 at 64d
        updated: 48L    %vreg6<def> = LDWRd %vreg5<kill>;
mem:LD2[%a](align=1)(tbaa=!"int") DLDREGS:%vreg6 PTRREGS:%vreg5
    Joined. Result = %vreg6 = [48d,80d:1)[80d,112d:0)  0 at 80d 1 at 48d
96L    %vreg8<def> = COPY %R25R24<kill>; PTRREGS:%vreg8
    Not coalescable.
********** INTERVALS POST JOINING **********
%R24,inf = [0L,16d:0)  0 at 0L-phidef
%vreg6 = [48d,80d:1)[80d,112d:0)  0 at 80d 1 at 48d
%R25R24,inf = [0L,96d:0)  0 at 0L-phidef
%vreg8 = [96d,112d:0)  0 at 96d
%vreg5 = [32d,48d:0)  0 at 32d
%R25,inf = [0L,16d:0)  0 at 0L-phidef
********** INTERVALS **********
%R24,inf = [0L,16d:0)  0 at 0L-phidef
%vreg6 = [48d,80d:1)[80d,112d:0)  0 at 80d 1 at 48d
%R25R24,inf = [0L,96d:0)  0 at 0L-phidef
%vreg8 = [96d,112d:0)  0 at 96d
%vreg5 = [32d,48d:0)  0 at 32d
%R25,inf = [0L,16d:0)  0 at 0L-phidef
********** MACHINEINSTRS **********
# Machine code for function foo:
Function Live Ins: %R25R24 in reg%2147483648

0L    BB#0: derived from LLVM BB %entry
        Live Ins: %R25R24
32L        %vreg5<def> = COPY %R25R24; PTRREGS:%vreg5
48L        %vreg6<def> = LDWRd %vreg5<kill>;
mem:LD2[%a](align=1)(tbaa=!"int") DLDREGS:%vreg6 PTRREGS:%vreg5
80L        %vreg6<def> = ANDIWRdK %vreg6, 255; DLDREGS:%vreg6
96L        %vreg8<def> = COPY %R25R24<kill>; PTRREGS:%vreg8
112L        STWRr %vreg8<kill>, %vreg6<kill>;
mem:ST2[%a](align=1)(tbaa=!"int") PTRREGS:%vreg8 DLDREGS:%vreg6
128L        RET

What i see is the first copy getting coalesced so vreg0 goes away, and when
it tries and succeeds to coalesce vreg4 with vreg6 it kills vreg5 dont know
why. Because of the first coalesce R25R24 gets reloaded again and in the
last COPY it says it cant get coalesced i guess because it's trying to
coalesce a phys reg, if it was with vreg5 then it would coalesce it.

>> Cross class coalescing also has some heuristics to prevent it from
creating very small register classes
I've seen isWinToJoinCrossClass in SimpleRegisterCoalescing.cpp that does
exactly what you mean here, it has a check that says:

// This heuristics is good enough in practice, but it's obviously not
*right*.
  // 4 is a magic number that works well enough for x86, ARM, etc.

However this piece of code is not getting executed, so in this specific case
the problem seems to be in another part? Although i would like to say if
this can be sort of parametrized, because for small cpus, register classes
aren't as big as x86 or other beasts, so 4 which is the number used in this
specific heuristic seems high for these cpus.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110327/296f6016/attachment.html>