[PATCH] D62614: Fix for the OCL/LC to failure on some OCLPerf tests

Nicolai Hähnle via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Jun 5 00:33:05 PDT 2019


nhaehnle added inline comments.


================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:4174-4180
+      unsigned Reg = Op.getReg();
+      MachineInstr *DefMI = MRI.getUniqueVRegDef(Reg);
+      if (DefMI && DefMI->getParent() != InsertBB &&
+          MDT->dominates(DefMI->getParent(), InsertBB)) {
+        InsertBB = DefMI->getParent();
+        Insert = InsertBB->getFirstTerminator();
+      }
----------------
alex-t wrote:
> nhaehnle wrote:
> > Can you give an example of why this is necessary? Assuming we have LCSSA phis, I don't understand the reason for this.
> 
> The copy may be placed to the block produced by the edge splitting like below:
> 
> the copy of %14 in the example below will be placed to the bb.4.Flow that is already out of the loop body.
> 
> 
> bb.2.Flow7:
> ; predecessors: %bb.0, %bb.4
>   successors: %bb.5(0x80000000); %bb.5(100.00%)
> 
>   **%7:vgpr_32 **= PHI %108:vgpr_32, %bb.0, **%14:sreg_32_xm0,** %bb.4
>   SI_END_CF %6:sreg_64, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
>   S_BRANCH %bb.5
> 
> bb.3.for.body:
> ; predecessors: %bb.1, %bb.3
>   successors: %bb.4(0x04000000), %bb.3(0x7c000000); %bb.4(3.12%), %bb.3(96.88%)
> 
>   
>   %14:sreg_32_xm0 = S_ADD_I32 %10:sreg_32_xm0, killed %86:sreg_32_xm0, implicit-def dead $scc
>   
>   SI_LOOP %15:sreg_64, %bb.3, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
>   S_BRANCH %bb.4
> 
> bb.4.Flow:
> ; predecessors: %bb.3
>   successors: %bb.2(0x80000000); %bb.2(100.00%)
> 
>   SI_END_CF %15:sreg_64, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
>   S_BRANCH %bb.2
> 
> 
> bb.5.for.end:
> ; predecessors: %bb.2
> 
>   %16:vgpr_32 = PHI **%7:vgpr_32, %bb.2**
> 
That PHI is not an LCSSA phi. Is this with a compilation that has D60834?


================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:5969-5979
+bool SIInstrInfo::shouldSink(const MachineInstr &MI) const {
+  const MachineRegisterInfo &MRI = MI.getParent()->getParent()->getRegInfo();
+  if (MI.isCopy()) {
+    unsigned SrcReg = MI.getOperand(1).getReg();
+    unsigned DstReg = MI.getOperand(0).getReg();
+    if (RI.isSGPRReg(MRI, SrcReg) && !RI.isSGPRReg(MRI, DstReg)) {
+      return false;
----------------
alex-t wrote:
> nhaehnle wrote:
> > shouldSink should never be used for correctness. See the comment:
> > ```
> >   /// MachineSink determines on its own whether the instruction is safe to sink;
> >   /// this gives the target a hook to override the default behavior with regards
> >   /// to which instructions should be sunk.
> > ```
> > I suspect the right fix is to ensure that COPY instructions inserted by legalizeGenericOperands are marked with an implicit use of EXEC when a VGPR is involved.
> Comment says: "this gives the target a hook to override the default behavior with regards
>  to which instructions should be sunk.".  So, I don't understand why it "should never be used for correctness".
> Also, adding extra operands to the copies leads Peephole optimizer to crash in assert
> 
> Assertion failed: Def->getNumOperands() == 2 && "Invalid number of operands", file llvm\lib\CodeGen\PeepholeOptimizer.cpp, line 1811
> 
> The Si Fix VGPR Copies pass adds implicit uses of EXEC later on...
It should never be used for correctness because the comment says so: **MachineSink determines on its own whether the instruction is safe to sink**. This change makes that comment incorrect.

IMO it would be better to remove the assertion in the ValueTracker. Sure, it claims that bad things will happen everywhere, but we're already using COPY with an implicit EXEC use through much of the compiler pipeline without bad things happening, and this is the semantically right way forward.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D62614/new/

https://reviews.llvm.org/D62614





More information about the llvm-commits mailing list