[PATCH] D54340: AMDGPU: Fix various issues around the VirtReg2Value mapping

Fri Nov 9 11:51:53 PST 2018

nhaehnle created this revision.
nhaehnle added reviewers: alex-t, arsenm, rampitec.
Herald added subscribers: t-tye, tpr, dstuttard, yaxunl, wdng, jvesely, kzhuravl.

The VirtReg2Value mapping is crucial for getting consistently
reliable divergence information into the SelectionDAG. This
patch fixes a bunch of issues that lead to incorrect divergence
info and introduces tight assertions to ensure we don't regress:

1. VirtReg2Value is generated lazily; there were some cases where a lookup was performed before all relevant virtual registers were created, leading to an out-of-sync mapping. Those cases were:
  - Complex code to lower formal arguments that generated CopyFromReg nodes from live-in registers (fixed by never querying the mapping for live-in registers).
  - Code that generates CopyToReg for formal arguments that are used outside the entry basic block (fixed by never querying the mapping for Register nodes, which don't need the divergence info anyway).

2. For complex values that are lowered to a sequence of registers, all registers must be reflected in the VirtReg2Value mapping.

I am not adding any new tests, since I'm not actually aware of any
bugs that these problems are causing with trunk as-is. However,
I recently added a test case (in r346423) which fails when https://reviews.llvm.org/D53283 is
applied without this change. Also, the new assertions should provide
most of the effective test coverage.

There is one test change in sdwa-peephole.ll. The underlying issue
is that since the divergence info is now correct, the DAGISel will
select V_OR_B32 directly instead of S_OR_B32. This leads to an extra
COPY which affects the behavior of MachineLICM in a way that ends up
with the S_MOV_B32 with the constant in a different basic block than
the V_OR_B32, which is presumably what defeats the peephole.


Repository:
  rL LLVM

https://reviews.llvm.org/D54340

Files:
  include/llvm/CodeGen/FunctionLoweringInfo.h
  lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp
  lib/Target/AMDGPU/SIISelLowering.cpp
  test/CodeGen/AMDGPU/sdwa-peephole.ll


Index: test/CodeGen/AMDGPU/sdwa-peephole.ll
===================================================================

--- test/CodeGen/AMDGPU/sdwa-peephole.ll
+++ test/CodeGen/AMDGPU/sdwa-peephole.ll
@@ -501,7 +501,12 @@
 ; GCN-LABEL: {{^}}sdwa_crash_inlineasm_def:
 ; GCN: s_mov_b32 s{{[0-9]+}}, 0xffff
 ; GCN: v_and_b32_e32 v{{[0-9]+}}, s{{[0-9]+}}, v{{[0-9]+}}
-; GCN: v_or_b32_e32 v{{[0-9]+}}, 0x10000,
+;
+; TODO: Why is the constant not peepholed into the v_or_b32_e32?
+;
+; NOSDWA: s_mov_b32 [[CONST:s[0-9]+]], 0x10000
+; NOSDWA: v_or_b32_e32 v{{[0-9]+}}, s0,
+; SDWA: v_or_b32_e32 v{{[0-9]+}}, 0x10000,
 define amdgpu_kernel void @sdwa_crash_inlineasm_def() #0 {
 bb:
   br label %bb1
Index: lib/Target/AMDGPU/SIISelLowering.cpp
===================================================================
--- lib/Target/AMDGPU/SIISelLowering.cpp
+++ lib/Target/AMDGPU/SIISelLowering.cpp
@@ -9284,11 +9284,21 @@
   Known.Zero.setHighBits(AssumeFrameIndexHighZeroBits);
 }
 
+static bool isCopyFromRegOfInlineAsm(const SDNode *N) {
+  assert(N->getOpcode() == ISD::CopyFromReg);
+  do {
+    // Follow the chain until we find an INLINEASM node.
+    N = N->getOperand(0).getNode();
+    if (N->getOpcode() == ISD::INLINEASM)
+      return true;
+  } while (N->getOpcode() == ISD::CopyFromReg);
+  return false;
+}
+
 bool SITargetLowering::isSDNodeSourceOfDivergence(const SDNode * N,
   FunctionLoweringInfo * FLI, LegacyDivergenceAnalysis * KDA) const
 {
   switch (N->getOpcode()) {
-    case ISD::Register:
     case ISD::CopyFromReg:
     {
       const RegisterSDNode *R = nullptr;
@@ -9317,8 +9327,13 @@
           // are conservatively considered divergent
           else if (!AMDGPU::isEntryFunctionCC(FLI->Fn->getCallingConv()))
             return true;
+          return false;
         }
-        return !KDA || KDA->isDivergent(FLI->getValueFromVirtualReg(Reg));
+        const Value *V = FLI->getValueFromVirtualReg(Reg);
+        if (V)
+          return KDA->isDivergent(V);
+        assert(Reg == FLI->DemoteRegister || isCopyFromRegOfInlineAsm(N));
+        return TRI.isVGPR(MRI, Reg);
       }
     }
     break;
Index: lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp
===================================================================
--- lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp
+++ lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp
@@ -579,9 +579,18 @@
 const Value *
 FunctionLoweringInfo::getValueFromVirtualReg(unsigned Vreg) {
   if (VirtReg2Value.empty()) {
+    SmallVector<EVT, 4> ValueVTs;
     for (auto &P : ValueMap) {
-      VirtReg2Value[P.second] = P.first;
+      ValueVTs.clear();
+      ComputeValueVTs(*TLI, Fn->getParent()->getDataLayout(),
+                      P.first->getType(), ValueVTs);
+      unsigned Reg = P.second;
+      for (EVT VT : ValueVTs) {
+        unsigned NumRegisters = TLI->getNumRegisters(Fn->getContext(), VT);
+        for (unsigned i = 0, e = NumRegisters; i != e; ++i)
+          VirtReg2Value[Reg++] = P.first;
+      }
     }
   }
-  return VirtReg2Value[Vreg];
+  return VirtReg2Value.lookup(Vreg);
 }
Index: include/llvm/CodeGen/FunctionLoweringInfo.h
===================================================================
--- include/llvm/CodeGen/FunctionLoweringInfo.h
+++ include/llvm/CodeGen/FunctionLoweringInfo.h
@@ -246,6 +246,7 @@
       return 0;
     unsigned &R = ValueMap[V];
     assert(R == 0 && "Already initialized this value register!");
+    assert(VirtReg2Value.empty());
     return R = CreateRegs(V->getType());
   }
 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D54340.173402.patch
Type: text/x-patch
Size: 3518 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20181109/727bc98d/attachment.bin>