[PATCH] D62614: Fix for the OCL/LC to failure on some OCLPerf tests

Alexander via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Jun 3 08:01:12 PDT 2019


alex-t marked 2 inline comments as done.
alex-t added inline comments.


================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:4174-4180
+      unsigned Reg = Op.getReg();
+      MachineInstr *DefMI = MRI.getUniqueVRegDef(Reg);
+      if (DefMI && DefMI->getParent() != InsertBB &&
+          MDT->dominates(DefMI->getParent(), InsertBB)) {
+        InsertBB = DefMI->getParent();
+        Insert = InsertBB->getFirstTerminator();
+      }
----------------
nhaehnle wrote:
> Can you give an example of why this is necessary? Assuming we have LCSSA phis, I don't understand the reason for this.

The copy may be placed to the block produced by the edge splitting like below:

the copy of %14 in the example below will be placed to the bb.4.Flow that is already out of the loop body.


bb.2.Flow7:
; predecessors: %bb.0, %bb.4
  successors: %bb.5(0x80000000); %bb.5(100.00%)

  **%7:vgpr_32 **= PHI %108:vgpr_32, %bb.0, **%14:sreg_32_xm0,** %bb.4
  SI_END_CF %6:sreg_64, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
  S_BRANCH %bb.5

bb.3.for.body:
; predecessors: %bb.1, %bb.3
  successors: %bb.4(0x04000000), %bb.3(0x7c000000); %bb.4(3.12%), %bb.3(96.88%)

  
  %14:sreg_32_xm0 = S_ADD_I32 %10:sreg_32_xm0, killed %86:sreg_32_xm0, implicit-def dead $scc
  
  SI_LOOP %15:sreg_64, %bb.3, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
  S_BRANCH %bb.4

bb.4.Flow:
; predecessors: %bb.3
  successors: %bb.2(0x80000000); %bb.2(100.00%)

  SI_END_CF %15:sreg_64, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
  S_BRANCH %bb.2


bb.5.for.end:
; predecessors: %bb.2

  %16:vgpr_32 = PHI **%7:vgpr_32, %bb.2**



================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:5969-5979
+bool SIInstrInfo::shouldSink(const MachineInstr &MI) const {
+  const MachineRegisterInfo &MRI = MI.getParent()->getParent()->getRegInfo();
+  if (MI.isCopy()) {
+    unsigned SrcReg = MI.getOperand(1).getReg();
+    unsigned DstReg = MI.getOperand(0).getReg();
+    if (RI.isSGPRReg(MRI, SrcReg) && !RI.isSGPRReg(MRI, DstReg)) {
+      return false;
----------------
nhaehnle wrote:
> shouldSink should never be used for correctness. See the comment:
> ```
>   /// MachineSink determines on its own whether the instruction is safe to sink;
>   /// this gives the target a hook to override the default behavior with regards
>   /// to which instructions should be sunk.
> ```
> I suspect the right fix is to ensure that COPY instructions inserted by legalizeGenericOperands are marked with an implicit use of EXEC when a VGPR is involved.
Comment says: "this gives the target a hook to override the default behavior with regards
 to which instructions should be sunk.".  So, I don't understand why it "should never be used for correctness".
Also, adding extra operands to the copies leads Peephole optimizer to crash in assert

Assertion failed: Def->getNumOperands() == 2 && "Invalid number of operands", file llvm\lib\CodeGen\PeepholeOptimizer.cpp, line 1811

The Si Fix VGPR Copies pass adds implicit uses of EXEC later on...


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D62614/new/

https://reviews.llvm.org/D62614





More information about the llvm-commits mailing list