[PATCH] D29436: RegisterCoalescer: Fix joinReservedPhysReg()

Krzysztof Parzyszek via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Feb 6 06:48:42 PST 2017


kparzysz accepted this revision.
kparzysz added inline comments.
This revision is now accepted and ready to land.


================
Comment at: lib/CodeGen/RegisterCoalescer.cpp:1825
+        // definition/valno of SrcReg.
+        for (SlotIndex SI = Indexes->getNextNonNullIndex(S.start);
+             SI < S.end; SI = Indexes->getNextNonNullIndex(SI)) {
----------------
MatzeB wrote:
> kparzysz wrote:
> > I think starting from the next index is somewhat dangerous.  Consider this scenario: Hexagon has an instruction "J4_jumpsetr Rd, Rs, #bb", which is equivalent to "Rd = Rs; jump #bb".  This one will not cause a problem here, but it inspired me to invent a new hypothetical instruction where the branch is indirect: "J4_jumprsetr Rd, Rs, Rt", defined as "Rd = Rs; jumpr Rt".
> > 
> > I converted the testcase to Hexagon (use R30 instead of FP), with the change that it now uses the invented instruction to define virtual register:  "J4_jumprsetr %0, R0, R30". Before coalescing, this would be equivalent to "%0 = R0; jumpr R30", after coalescing it becomes "R30 = R0; jumpr R30".  While with the Hexagon semantics the R30 in the target of the branch would still be the old value, I'm not sure that this can be said about any potential architecture (which would support this kind of instructions).  Reserved physical registers may not obey the same use/def ordering, and there is no guarantee that the branch to R30 above would not be redirected to the newly defined value (or that is still a legal instruction to begin with).
> > 
> > Here's the modified testcase:
> > 
> > ```
> > # RUN: llc -mtriple=hexagon -run-pass=simple-register-coalescing %s -o - | FileCheck %s
> > --- |
> >   define void @func1() { ret void }
> > ...
> > ---
> > name: func1
> > body: |
> >   bb.0:
> >     successors: %bb.3, %bb.4
> >     J2_jumpt undef %p0, %bb.3, implicit def %pc
> >     J2_jump %bb.4, implicit-def %pc
> > 
> >   bb.1:
> >     successors: %bb.2, %bb.5
> >     %r30 = COPY %0
> >     J2_jumpt undef %p0, %bb.2, implicit-def %pc
> >     J2_jump %bb.5, implicit-def %pc
> > 
> >   bb.2:
> >     successors: %bb.6
> >     %r30 = COPY %r0; outside the lifetime of %0, so shouldn't matter
> >     J2_jump %bb.6, implicit-def %pc
> > 
> >   bb.3:
> >     %r0 = COPY %r30 ; outside the lifetime of %0, so shouldn't matter
> >     J2_jumpr %r31, implicit-def %pc
> > 
> >   bb.5:
> >     S2_storeri_io %r30, 0, %r30
> >     J2_jumpr %r31, implicit-def %pc
> > 
> >   bb.6:
> >     J2_jumpr %r31, implicit-def %pc
> > 
> >   bb.4:
> >     successors: %bb.1
> >     %0 : intregs = J4_jumprsetr %r0, %r30, implicit-def %pc
> > ...
> > ```
> > 
> > You'd need to add a definition of it to the Hexagon target for the testcase to work, but you can change the testcase to use J4_jumpsetr (no "r" in the middle) to still get an instruction that uses R30.
> > 
> > Here's the output:
> > 
> > ```
> > ********** SIMPLE REGISTER COALESCING **********
> > ********** Function: func1
> > ********** JOINING INTERVALS ***********
> > (null):
> > 64B     %R30<def> = COPY %vreg0; IntRegs:%vreg0
> >         Considering merging %vreg0 with %R30
> >                 RHS = %vreg0 [48B,64r:0)[304r,320B:0)  0 at 304r
> >                 Removing phys reg def of %R30 at 64B
> >                 updated: 304B   %R30<def> = J4_jumprsetr %R0, %R30, %PC<imp-def>
> >         Success: %vreg0 -> %R30
> >         Result = %R30
> > (null):
> > (null):
> > 128B    %R30<def> = COPY %R0
> >         Not coalescable.
> > (null):
> > (null):
> > 176B    %R0<def> = COPY %R30
> >         Not coalescable.
> > (null):
> > (null):
> > Trying to inflate 0 regs.
> > ```
> > 
> > and we end up with
> > 
> > ```
> > 288B    BB#6:
> >             Predecessors according to CFG: BB#0
> > 304B            %R30<def> = J4_jumprsetr %R0, %R30, %PC<imp-def>
> >             Successors according to CFG: BB#1(0x80000000 / 0x80000000 = 100.00%)
> > ```
> Sorry, I am getting confused here. For my understand: Your concern is about the defining instruction of the virtual register. You think of "R30 = R0; jumpr R30" as execution in parallel (VLIW style). IN terms of register allocation/semantics this would be: read R0 and R30, the execute the two instructions; then write R30? That would match the semantics we model in our liveranges. If this is not the case we would need to set the definition to the earlyclobber register slot. You are right that this case is not handled correctly in the patch right now. I'll update.
> 
Yes, although I missed the earlyclobber solution (I was thinking about the concept of "late use" and the earlyclobber flag eluded me here).


Repository:
  rL LLVM

https://reviews.llvm.org/D29436





More information about the llvm-commits mailing list