[PATCH] D15425: AMDGPU/SI: Select constant loads with non-uniform addresses to MUBUF instructions

Thu Dec 10 11:10:17 PST 2015

arsenm added inline comments.

================
Comment at: lib/Target/AMDGPU/AMDGPUAnnotateUniformValues.cpp:37-39
@@ +36,5 @@
+
+  void setAllBranchesUniform(bool Uniform) {
+    if (AllBranchesUniform) *AllBranchesUniform = Uniform;
+  }
+
----------------
This + AllBranchesUniform don't seem to be used anywhere

================
Comment at: lib/Target/AMDGPU/AMDGPUAnnotateUniformValues.cpp:87
@@ +86,3 @@
+
+  return false;
+}
----------------
Should return true as the conservative default if not recording when changes are actually made

================
Comment at: lib/Target/AMDGPU/AMDGPUTargetMachine.cpp:282-283
@@ -280,2 +281,4 @@
   addPass(createSITypeRewriter());
   addPass(createSIAnnotateControlFlowPass());
+  addPass(createAMDGPUAnnotateUniformValues());
+
----------------
Why is this run after the control flow annotate pass?

================
Comment at: lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp:95-96
@@ +94,4 @@
+
+static bool isIntrinsicSourceOfDivergence(const TargetIntrinsicInfo *TII,
+                                          const IntrinsicInst *I) {
+  switch (I->getIntrinsicID()) {
----------------
I think the implementation of isSourceOfDivergence should be committed as a separate patch

================
Comment at: lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp:111-118
@@ +110,10 @@
+
+  StringRef Name = I->getCalledFunction()->getName();
+  switch (TII->lookupName((const char *)Name.bytes_begin(), Name.size())) {
+  default:
+    return false;
+  case AMDGPUIntrinsic::SI_tid:
+  case AMDGPUIntrinsic::SI_vs_load_input:
+  case AMDGPUIntrinsic::SI_fs_constant:
+  case AMDGPUIntrinsic::SI_fs_interp:
+    return true;
----------------
I think we should just move these into the public header so we get the enum IDs. I think mesa already  directly uses these, so it's where they belong anyway.

================
Comment at: lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp:119
@@ +118,3 @@
+  case AMDGPUIntrinsic::SI_fs_interp:
+    return true;
+  }
----------------
Do image/sample instructions need to be considered divergent? We also need some of the cross lane intrinsics but I'm not sure we have those yet

================
Comment at: lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp:125
@@ +124,3 @@
+/// \returns true if the result of the value could potentially be
+/// different across threads.
+bool AMDGPUTTIImpl::isSourceOfDivergence(const Value *V) const {
----------------
s/threads/workitems in a wavefront/

================
Comment at: lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp:137-138
@@ +136,4 @@
+
+    if (F->getAttributes().hasAttribute(A->getArgNo(), Attribute::InReg))
+      return false;
+
----------------
I think this should be split into a function called something like isArgPassedInSGPR with a comment about how inreg is interpreted

================
Comment at: lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp:141-142
@@ +140,4 @@
+    LLVMContext &Ctx = V->getContext();
+    return V->getType() == Type::getInt32Ty(Ctx) ||
+           V->getType() == Type::getFloatTy(Ctx);
+
----------------
Ty->isInt32Ty() / isFloatTy() would be better

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:514
@@ +513,3 @@
+	// UndefValue means this is a load of a kernel input.  These are uniform.
+	if (isa<UndefValue>(Ptr) || isa<Argument>(Ptr))
+		return true;
----------------
This maybe should check constant isa<Constant> which happens occasionally for LDS

================
Comment at: lib/Target/AMDGPU/SIInstructions.td:2111
@@ +2110,3 @@
+// selector to prefer those.
+let AddedComplexity = 10000 in {
+
----------------
A smaller number would be better. 10 is probably more than enough


http://reviews.llvm.org/D15425