[PATCH] D21562: [AMDGPU] Wave and register controls
Matt Arsenault via llvm-commits
llvm-commits at lists.llvm.org
Fri Jul 1 13:02:48 PDT 2016
arsenm added a comment.
Can you shrink the test cases? They are pretty big for not testing the content
================
Comment at: lib/Target/AMDGPU/AMDGPUSubtarget.cpp:269-285
@@ +268,19 @@
+ if (Requested.first > Requested.second) {
+ Ctx.emitError("invalid amdgpu-flat-work-group-size: "
+ "minimum is greater than maximum");
+ return Default;
+ }
+ // Make sure requested values do not violate subtarget's specifications.
+ if (Requested.first < getMinFlatWorkGroupSize()) {
+ Ctx.emitError("invalid amdgpu-flat-work-group-size: "
+ "requested minimum flat work group size is not supported by "
+ "the subtarget");
+ return Default;
+ }
+ if (Requested.second > getMaxFlatWorkGroupSize()) {
+ Ctx.emitError("invalid amdgpu-flat-work-group-size: "
+ "requested maximum flat work group size is not supported by "
+ "the subtarget");
+ return Default;
+ }
+
----------------
More DiagnosticInfos. These should also provide the function so clang will be able to provide a more useful debug location
================
Comment at: lib/Target/AMDGPU/AMDGPUSubtarget.cpp:294-309
@@ +293,18 @@
+
+ // Default minimum/maximum number of active waves per execution unit.
+ std::pair<unsigned, unsigned> Default = std::pair<unsigned, unsigned>(1, 0);
+ // Default/requested minimum/maximum flat work group sizes.
+ std::pair<unsigned, unsigned> FlatWorkGroupSizes = getFlatWorkGroupSizes(F);
+ // If minimum/maximum flat work group sizes were explicitly requested using
+ // "amdgpu-flat-work-group-size" attribute, then set default minimum/maximum
+ // number of active waves per execution unit to values implied by requested
+ // minimum/maximum flat work group sizes.
+ unsigned ImpliedByMinFlatWorkGroupSize =
+ getMaxNumActiveWavesPerEU(FlatWorkGroupSizes.first);
+ unsigned ImpliedByMaxFlatWorkGroupSize =
+ getMaxNumActiveWavesPerEU(FlatWorkGroupSizes.second);
+ unsigned MinImpliedByFlatWorkGroupSize =
+ std::min(ImpliedByMinFlatWorkGroupSize, ImpliedByMaxFlatWorkGroupSize);
+ unsigned MaxImpliedByFlatWorkGroupSize =
+ std::max(ImpliedByMinFlatWorkGroupSize, ImpliedByMaxFlatWorkGroupSize);
+ if (F.hasFnAttribute("amdgpu-flat-work-group-size")) {
----------------
Can you put blank lines after code before a comment, it's hard to read with it this dense
================
Comment at: lib/Target/AMDGPU/AMDGPUSubtarget.cpp:327-329
@@ +326,5 @@
+ Requested.first > getMaxNumActiveWavesPerEU()) {
+ Ctx.emitError("invalid amdgpu-num-active-waves-per-eu: "
+ "requested minimum number of active waves per execution unit "
+ "is not supported by the subtarget");
+ return Default;
----------------
This should use DiagnosticInfoUnsupported. Although on second thought these should probably just clamp and diagnosis in clang
================
Comment at: lib/Target/AMDGPU/AMDGPUSubtarget.cpp:333-335
@@ +332,5 @@
+ if (Requested.second > getMaxNumActiveWavesPerEU()) {
+ Ctx.emitError("invalid amdgpu-num-active-waves-per-eu: "
+ "requested maximum number of active waves per execution unit "
+ "is not supported by the subtarget");
+ return Default;
----------------
Ditto
================
Comment at: lib/Target/AMDGPU/AMDGPUSubtarget.cpp:342-345
@@ +341,6 @@
+ Requested.second > MaxImpliedByFlatWorkGroupSize) {
+ Ctx.emitError("invalid amdgpu-num-active-waves-per-eu: "
+ "requested number of active waves per execution unit cannot "
+ "be achieved with default or requested flat work group "
+ "sizes");
+ return Default;
----------------
Ditto
================
Comment at: lib/Target/AMDGPU/AMDGPUSubtarget.h:445
@@ +444,3 @@
+ /// subtarget's specifications.
+ std::pair<unsigned, unsigned> getFlatWorkGroupSizes(const Function &F) const;
+
----------------
kzhuravl wrote:
> The reason I put this function here is because it either returns flat work group size for a given function or subtarget's defaults, but I am not sure if this is the right place.
We probably shouldn't be using IR classes in the subtarget, since there's a desire to someday be able to throw away the IR during codegen, but since there's already one thing already doing this here is fine for now
================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:32
@@ -31,3 +31,3 @@
SIInstrInfo::SIInstrInfo(const AMDGPUSubtarget &st)
- : AMDGPUInstrInfo(st), RI() {}
+ : AMDGPUInstrInfo(st), RI(st) {}
----------------
This doesn't look right
================
Comment at: lib/Target/AMDGPU/SIMachineFunctionInfo.cpp:51-52
@@ -50,4 +50,4 @@
ReturnsVoid(true),
- MaximumWorkGroupSize(0),
- DebuggerReservedVGPRCount(0),
+ FlatWorkGroupSizes(std::pair<unsigned, unsigned>(0, 0)),
+ NumActiveWavesPerEU(std::pair<unsigned, unsigned>(0, 0)),
LDSWaveSpillSize(0),
----------------
I think just 0, 0 or 0u, 0u should work
================
Comment at: lib/Target/AMDGPU/SIMachineFunctionInfo.h:66
@@ +65,3 @@
+ std::pair<unsigned, unsigned> FlatWorkGroupSizes;
+ // A pair of default/requested minimum/maximum number of active waves per
+ // execution unit. Minimum - first, maximum - second.
----------------
Separating line
================
Comment at: lib/Target/AMDGPU/SIRegisterInfo.cpp:214
@@ -276,4 +213,3 @@
MachineFunction *MF = MBB->getParent();
- const AMDGPUSubtarget &Subtarget = MF->getSubtarget<AMDGPUSubtarget>();
- const TargetInstrInfo *TII = Subtarget.getInstrInfo();
+ const TargetInstrInfo *TII = ST.getInstrInfo();
----------------
SIInstrInfo
================
Comment at: lib/Target/AMDGPU/SIRegisterInfo.cpp:236
@@ -299,5 +235,3 @@
MachineFunction *MF = MBB->getParent();
- const AMDGPUSubtarget &Subtarget = MF->getSubtarget<AMDGPUSubtarget>();
- const SIInstrInfo *TII
- = static_cast<const SIInstrInfo *>(Subtarget.getInstrInfo());
+ const SIInstrInfo *TII = static_cast<const SIInstrInfo *>(ST.getInstrInfo());
----------------
You can remove the cast
================
Comment at: lib/Target/AMDGPU/SIRegisterInfo.h:28
@@ -27,2 +27,3 @@
private:
+ const AMDGPUSubtarget &ST;
unsigned SGPR32SetID;
----------------
I don't think the RegisterInfo should have a reference to the Subtarget
================
Comment at: test/CodeGen/AMDGPU/attr-amdgpu-flat-work-group-size-many-regs.ll:7-18
@@ +6,14 @@
+; CHECK: NumVGPRsForNumActiveWavesPerEU: 32
+define void @many_regs_128_256(i32 addrspace(1)* nocapture %A) #0 {
+entry:
+ %a = alloca [111 x i32], align 4
+ %0 = bitcast [111 x i32]* %a to i8*
+ call void @llvm.lifetime.start(i64 444, i8* %0) #2
+ %1 = load i32, i32 addrspace(1)* %A, align 4, !tbaa !7
+ %arrayidx1 = getelementptr inbounds [111 x i32], [111 x i32]* %a, i32 0, i32 0
+ store i32 %1, i32* %arrayidx1, align 4, !tbaa !7
+ %arrayidx.1 = getelementptr inbounds i32, i32 addrspace(1)* %A, i64 1
+ %2 = load i32, i32 addrspace(1)* %arrayidx.1, align 4, !tbaa !7
+ %arrayidx1.1 = getelementptr inbounds [111 x i32], [111 x i32]* %a, i32 0, i32 1
+ store i32 %2, i32* %arrayidx1.1, align 4, !tbaa !7
+ %arrayidx.2 = getelementptr inbounds i32, i32 addrspace(1)* %A, i64 2
----------------
Can you run instnamer on this and use CHECK-LABEL
================
Comment at: test/CodeGen/AMDGPU/attr-amdgpu-num-active-waves-per-eu-invalid.ll:8
@@ +7,3 @@
+}
+attributes #0 = {"amdgpu-num-active-waves-per-eu"="10,1"}
+
----------------
Some tests with unparsable sizes would be good
http://reviews.llvm.org/D21562
More information about the llvm-commits
mailing list