<div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div style="word-wrap:break-word"><div>You can look at the output of -debug-only=regcoalescing to see what is going on.</div>
<font color="#888888"><br></font></div></blockquote></div>This is the debug output i've got, some information is a bit cryptic for me so next is what i understood:<br><br>********** SIMPLE REGISTER COALESCING **********<br>
********** Function: foo<br>********** JOINING INTERVALS ***********<br>entry:<br>16L %vreg0<def> = COPY %R25R24<kill>; DREGS:%vreg0<br> Considering merging %vreg0 with physreg %R25R24<br> RHS = %vreg0 = [16d,96d:0) 0@16d<br>
LHS = %R25R24,inf = [0L,16d:0) 0@0L-phidef<br> updated: 96L %vreg8<def> = COPY %R25R24<kill>; PTRREGS:%vreg8<br> updated: 32L %vreg5<def> = COPY %R25R24; PTRREGS:%vreg5<br> Joined. Result = %R25R24,inf = [0L,96d:0) 0@0L-phidef<br>
32L %vreg5<def> = COPY %R25R24; PTRREGS:%vreg5<br> Not coalescable.<br>64L %vreg6<def> = COPY %vreg4<kill>; DLDREGS:%vreg6,%vreg4<br> Considering merging %vreg4 with %vreg6 to DLDREGS<br> RHS = %vreg4 = [48d,64d:0) 0@48d<br>
LHS = %vreg6 = [64d,80d:1)[80d,112d:0) 0@80d 1@64d<br> updated: 48L %vreg6<def> = LDWRd %vreg5<kill>; mem:LD2[%a](align=1)(tbaa=!"int") DLDREGS:%vreg6 PTRREGS:%vreg5<br> Joined. Result = %vreg6 = [48d,80d:1)[80d,112d:0) 0@80d 1@48d<br>
96L %vreg8<def> = COPY %R25R24<kill>; PTRREGS:%vreg8<br> Not coalescable.<br>********** INTERVALS POST JOINING **********<br>%R24,inf = [0L,16d:0) 0@0L-phidef<br>%vreg6 = [48d,80d:1)[80d,112d:0) 0@80d 1@48d<br>
%R25R24,inf = [0L,96d:0) 0@0L-phidef<br>%vreg8 = [96d,112d:0) 0@96d<br>%vreg5 = [32d,48d:0) 0@32d<br>%R25,inf = [0L,16d:0) 0@0L-phidef<br>********** INTERVALS **********<br>%R24,inf = [0L,16d:0) 0@0L-phidef<br>%vreg6 = [48d,80d:1)[80d,112d:0) 0@80d 1@48d<br>
%R25R24,inf = [0L,96d:0) 0@0L-phidef<br>%vreg8 = [96d,112d:0) 0@96d<br>%vreg5 = [32d,48d:0) 0@32d<br>%R25,inf = [0L,16d:0) 0@0L-phidef<br>********** MACHINEINSTRS **********<br># Machine code for function foo:<br>Function Live Ins: %R25R24 in reg%2147483648<br>
<br>0L BB#0: derived from LLVM BB %entry<br> Live Ins: %R25R24<br>32L %vreg5<def> = COPY %R25R24; PTRREGS:%vreg5<br>48L %vreg6<def> = LDWRd %vreg5<kill>; mem:LD2[%a](align=1)(tbaa=!"int") DLDREGS:%vreg6 PTRREGS:%vreg5<br>
80L %vreg6<def> = ANDIWRdK %vreg6, 255; DLDREGS:%vreg6<br>96L %vreg8<def> = COPY %R25R24<kill>; PTRREGS:%vreg8<br>112L STWRr %vreg8<kill>, %vreg6<kill>; mem:ST2[%a](align=1)(tbaa=!"int") PTRREGS:%vreg8 DLDREGS:%vreg6<br>
128L RET<br><br>What i see is the first copy getting coalesced so vreg0 goes away, and when it tries and succeeds to coalesce vreg4 with vreg6 it kills vreg5 dont know why. Because of the first coalesce R25R24 gets reloaded again and in the last COPY it says it cant get coalesced i guess because it's trying to coalesce a phys reg, if it was with vreg5 then it would coalesce it.<br>
<br>>> Cross class coalescing also has some heuristics to prevent it from creating very small register classes<br>I've seen isWinToJoinCrossClass in SimpleRegisterCoalescing.cpp that does exactly what you mean here, it has a check that says:<br>
<br>// This heuristics is good enough in practice, but it's obviously not *right*.<br> // 4 is a magic number that works well enough for x86, ARM, etc.<br><br>However this piece of code is not getting executed, so in this specific case the problem seems to be in another part? Although i would like to say if this can be sort of parametrized, because for small cpus, register classes aren't as big as x86 or other beasts, so 4 which is the number used in this specific heuristic seems high for these cpus.<br>
<br>