[PATCH] D21562: [AMDGPU] Wave and register controls

Fri Jul 1 13:02:48 PDT 2016

arsenm added a comment.

Can you shrink the test cases? They are pretty big for not testing the content


================
Comment at: lib/Target/AMDGPU/AMDGPUSubtarget.cpp:269-285
@@ +268,19 @@
+  if (Requested.first > Requested.second) {
+    Ctx.emitError("invalid amdgpu-flat-work-group-size: "
+                  "minimum is greater than maximum");
+    return Default;
+  }
+  // Make sure requested values do not violate subtarget's specifications.
+  if (Requested.first < getMinFlatWorkGroupSize()) {
+    Ctx.emitError("invalid amdgpu-flat-work-group-size: "
+                  "requested minimum flat work group size is not supported by "
+                  "the subtarget");
+    return Default;
+  }
+  if (Requested.second > getMaxFlatWorkGroupSize()) {
+    Ctx.emitError("invalid amdgpu-flat-work-group-size: "
+                  "requested maximum flat work group size is not supported by "
+                  "the subtarget");
+    return Default;
+  }
+
----------------
More DiagnosticInfos. These should also provide the function so clang will be able to provide a more useful debug location

================
Comment at: lib/Target/AMDGPU/AMDGPUSubtarget.cpp:294-309
@@ +293,18 @@
+
+  // Default minimum/maximum number of active waves per execution unit.
+  std::pair<unsigned, unsigned> Default = std::pair<unsigned, unsigned>(1, 0);
+  // Default/requested minimum/maximum flat work group sizes.
+  std::pair<unsigned, unsigned> FlatWorkGroupSizes = getFlatWorkGroupSizes(F);
+  // If minimum/maximum flat work group sizes were explicitly requested using
+  // "amdgpu-flat-work-group-size" attribute, then set default minimum/maximum
+  // number of active waves per execution unit to values implied by requested
+  // minimum/maximum flat work group sizes.
+  unsigned ImpliedByMinFlatWorkGroupSize =
+    getMaxNumActiveWavesPerEU(FlatWorkGroupSizes.first);
+  unsigned ImpliedByMaxFlatWorkGroupSize =
+    getMaxNumActiveWavesPerEU(FlatWorkGroupSizes.second);
+  unsigned MinImpliedByFlatWorkGroupSize =
+    std::min(ImpliedByMinFlatWorkGroupSize, ImpliedByMaxFlatWorkGroupSize);
+  unsigned MaxImpliedByFlatWorkGroupSize =
+    std::max(ImpliedByMinFlatWorkGroupSize, ImpliedByMaxFlatWorkGroupSize);
+  if (F.hasFnAttribute("amdgpu-flat-work-group-size")) {
----------------
Can you put blank lines after code before a comment, it's hard to read with it this dense

================
Comment at: lib/Target/AMDGPU/AMDGPUSubtarget.cpp:327-329
@@ +326,5 @@
+      Requested.first > getMaxNumActiveWavesPerEU()) {
+    Ctx.emitError("invalid amdgpu-num-active-waves-per-eu: "
+                  "requested minimum number of active waves per execution unit "
+                  "is not supported by the subtarget");
+    return Default;
----------------
This should use DiagnosticInfoUnsupported. Although on second thought these should probably just clamp and diagnosis in clang

================
Comment at: lib/Target/AMDGPU/AMDGPUSubtarget.cpp:333-335
@@ +332,5 @@
+  if (Requested.second > getMaxNumActiveWavesPerEU()) {
+    Ctx.emitError("invalid amdgpu-num-active-waves-per-eu: "
+                  "requested maximum number of active waves per execution unit "
+                  "is not supported by the subtarget");
+    return Default;
----------------
Ditto

================
Comment at: lib/Target/AMDGPU/AMDGPUSubtarget.cpp:342-345
@@ +341,6 @@
+      Requested.second > MaxImpliedByFlatWorkGroupSize) {
+    Ctx.emitError("invalid amdgpu-num-active-waves-per-eu: "
+                  "requested number of active waves per execution unit cannot "
+                  "be achieved with default or requested flat work group "
+                  "sizes");
+    return Default;
----------------
Ditto

================
Comment at: lib/Target/AMDGPU/AMDGPUSubtarget.h:445
@@ +444,3 @@
+  /// subtarget's specifications.
+  std::pair<unsigned, unsigned> getFlatWorkGroupSizes(const Function &F) const;
+
----------------
kzhuravl wrote:
> The reason I put this function here is because it either returns flat work group size for a given function or subtarget's defaults, but I am not sure if this is the right place.
We probably shouldn't be using IR classes in the subtarget, since there's a desire to someday be able to throw away the IR during codegen, but since there's already one thing already doing this here is fine for now

================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:32
@@ -31,3 +31,3 @@
 SIInstrInfo::SIInstrInfo(const AMDGPUSubtarget &st)
-    : AMDGPUInstrInfo(st), RI() {}
+    : AMDGPUInstrInfo(st), RI(st) {}
 
----------------
This doesn't look right

================
Comment at: lib/Target/AMDGPU/SIMachineFunctionInfo.cpp:51-52
@@ -50,4 +50,4 @@
     ReturnsVoid(true),
-    MaximumWorkGroupSize(0),
-    DebuggerReservedVGPRCount(0),
+    FlatWorkGroupSizes(std::pair<unsigned, unsigned>(0, 0)),
+    NumActiveWavesPerEU(std::pair<unsigned, unsigned>(0, 0)),
     LDSWaveSpillSize(0),
----------------
I think just 0, 0 or 0u, 0u should work

================
Comment at: lib/Target/AMDGPU/SIMachineFunctionInfo.h:66
@@ +65,3 @@
+  std::pair<unsigned, unsigned> FlatWorkGroupSizes;
+  // A pair of default/requested minimum/maximum number of active waves per
+  // execution unit. Minimum - first, maximum - second.
----------------
Separating line

================
Comment at: lib/Target/AMDGPU/SIRegisterInfo.cpp:214
@@ -276,4 +213,3 @@
   MachineFunction *MF = MBB->getParent();
-  const AMDGPUSubtarget &Subtarget = MF->getSubtarget<AMDGPUSubtarget>();
-  const TargetInstrInfo *TII = Subtarget.getInstrInfo();
+  const TargetInstrInfo *TII = ST.getInstrInfo();
 
----------------
SIInstrInfo

================
Comment at: lib/Target/AMDGPU/SIRegisterInfo.cpp:236
@@ -299,5 +235,3 @@
   MachineFunction *MF = MBB->getParent();
-  const AMDGPUSubtarget &Subtarget = MF->getSubtarget<AMDGPUSubtarget>();
-  const SIInstrInfo *TII
-    = static_cast<const SIInstrInfo *>(Subtarget.getInstrInfo());
+  const SIInstrInfo *TII = static_cast<const SIInstrInfo *>(ST.getInstrInfo());
 
----------------
You can remove the cast

================
Comment at: lib/Target/AMDGPU/SIRegisterInfo.h:28
@@ -27,2 +27,3 @@
 private:
+  const AMDGPUSubtarget &ST;
   unsigned SGPR32SetID;
----------------
I don't think the RegisterInfo should have a reference to the Subtarget

================
Comment at: test/CodeGen/AMDGPU/attr-amdgpu-flat-work-group-size-many-regs.ll:7-18
@@ +6,14 @@
+; CHECK: NumVGPRsForNumActiveWavesPerEU: 32
+define void @many_regs_128_256(i32 addrspace(1)* nocapture %A) #0 {
+entry:
+  %a = alloca [111 x i32], align 4
+  %0 = bitcast [111 x i32]* %a to i8*
+  call void @llvm.lifetime.start(i64 444, i8* %0) #2
+  %1 = load i32, i32 addrspace(1)* %A, align 4, !tbaa !7
+  %arrayidx1 = getelementptr inbounds [111 x i32], [111 x i32]* %a, i32 0, i32 0
+  store i32 %1, i32* %arrayidx1, align 4, !tbaa !7
+  %arrayidx.1 = getelementptr inbounds i32, i32 addrspace(1)* %A, i64 1
+  %2 = load i32, i32 addrspace(1)* %arrayidx.1, align 4, !tbaa !7
+  %arrayidx1.1 = getelementptr inbounds [111 x i32], [111 x i32]* %a, i32 0, i32 1
+  store i32 %2, i32* %arrayidx1.1, align 4, !tbaa !7
+  %arrayidx.2 = getelementptr inbounds i32, i32 addrspace(1)* %A, i64 2
----------------
Can you run instnamer on this and use CHECK-LABEL

================
Comment at: test/CodeGen/AMDGPU/attr-amdgpu-num-active-waves-per-eu-invalid.ll:8
@@ +7,3 @@
+}
+attributes #0 = {"amdgpu-num-active-waves-per-eu"="10,1"}
+
----------------
Some tests with unparsable sizes would be good


http://reviews.llvm.org/D21562