[LLVMdev] Possible missed optimization?
Borja Ferrer
borja.ferav at gmail.com
Sat Mar 26 16:09:41 PDT 2011
>
> You can look at the output of -debug-only=regcoalescing to see what is
> going on.
>
> This is the debug output i've got, some information is a bit cryptic for me
so next is what i understood:
********** SIMPLE REGISTER COALESCING **********
********** Function: foo
********** JOINING INTERVALS ***********
entry:
16L %vreg0<def> = COPY %R25R24<kill>; DREGS:%vreg0
Considering merging %vreg0 with physreg %R25R24
RHS = %vreg0 = [16d,96d:0) 0 at 16d
LHS = %R25R24,inf = [0L,16d:0) 0 at 0L-phidef
updated: 96L %vreg8<def> = COPY %R25R24<kill>; PTRREGS:%vreg8
updated: 32L %vreg5<def> = COPY %R25R24; PTRREGS:%vreg5
Joined. Result = %R25R24,inf = [0L,96d:0) 0 at 0L-phidef
32L %vreg5<def> = COPY %R25R24; PTRREGS:%vreg5
Not coalescable.
64L %vreg6<def> = COPY %vreg4<kill>; DLDREGS:%vreg6,%vreg4
Considering merging %vreg4 with %vreg6 to DLDREGS
RHS = %vreg4 = [48d,64d:0) 0 at 48d
LHS = %vreg6 = [64d,80d:1)[80d,112d:0) 0 at 80d 1 at 64d
updated: 48L %vreg6<def> = LDWRd %vreg5<kill>;
mem:LD2[%a](align=1)(tbaa=!"int") DLDREGS:%vreg6 PTRREGS:%vreg5
Joined. Result = %vreg6 = [48d,80d:1)[80d,112d:0) 0 at 80d 1 at 48d
96L %vreg8<def> = COPY %R25R24<kill>; PTRREGS:%vreg8
Not coalescable.
********** INTERVALS POST JOINING **********
%R24,inf = [0L,16d:0) 0 at 0L-phidef
%vreg6 = [48d,80d:1)[80d,112d:0) 0 at 80d 1 at 48d
%R25R24,inf = [0L,96d:0) 0 at 0L-phidef
%vreg8 = [96d,112d:0) 0 at 96d
%vreg5 = [32d,48d:0) 0 at 32d
%R25,inf = [0L,16d:0) 0 at 0L-phidef
********** INTERVALS **********
%R24,inf = [0L,16d:0) 0 at 0L-phidef
%vreg6 = [48d,80d:1)[80d,112d:0) 0 at 80d 1 at 48d
%R25R24,inf = [0L,96d:0) 0 at 0L-phidef
%vreg8 = [96d,112d:0) 0 at 96d
%vreg5 = [32d,48d:0) 0 at 32d
%R25,inf = [0L,16d:0) 0 at 0L-phidef
********** MACHINEINSTRS **********
# Machine code for function foo:
Function Live Ins: %R25R24 in reg%2147483648
0L BB#0: derived from LLVM BB %entry
Live Ins: %R25R24
32L %vreg5<def> = COPY %R25R24; PTRREGS:%vreg5
48L %vreg6<def> = LDWRd %vreg5<kill>;
mem:LD2[%a](align=1)(tbaa=!"int") DLDREGS:%vreg6 PTRREGS:%vreg5
80L %vreg6<def> = ANDIWRdK %vreg6, 255; DLDREGS:%vreg6
96L %vreg8<def> = COPY %R25R24<kill>; PTRREGS:%vreg8
112L STWRr %vreg8<kill>, %vreg6<kill>;
mem:ST2[%a](align=1)(tbaa=!"int") PTRREGS:%vreg8 DLDREGS:%vreg6
128L RET
What i see is the first copy getting coalesced so vreg0 goes away, and when
it tries and succeeds to coalesce vreg4 with vreg6 it kills vreg5 dont know
why. Because of the first coalesce R25R24 gets reloaded again and in the
last COPY it says it cant get coalesced i guess because it's trying to
coalesce a phys reg, if it was with vreg5 then it would coalesce it.
>> Cross class coalescing also has some heuristics to prevent it from
creating very small register classes
I've seen isWinToJoinCrossClass in SimpleRegisterCoalescing.cpp that does
exactly what you mean here, it has a check that says:
// This heuristics is good enough in practice, but it's obviously not
*right*.
// 4 is a magic number that works well enough for x86, ARM, etc.
However this piece of code is not getting executed, so in this specific case
the problem seems to be in another part? Although i would like to say if
this can be sort of parametrized, because for small cpus, register classes
aren't as big as x86 or other beasts, so 4 which is the number used in this
specific heuristic seems high for these cpus.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110327/296f6016/attachment.html>
More information about the llvm-dev
mailing list