[PATCH] D102954: [AMDGPU] Lower kernel LDS into a sorted structure

Mon May 24 14:23:40 PDT 2021

rampitec added inline comments.

================
Comment at: llvm/lib/IR/Value.cpp:549
+    if (auto *C = dyn_cast<Constant>(U.getUser())) {
+      if (!isa<GlobalValue>(C)) {
+        C->handleOperandChange(this, New);
----------------
hsmhsm wrote:
> I do not understand, why are we not considering "GlobalVlaue" here. It is upto caller who calls Value::replaceUsesWithIf() to decide on it, and this caller is the one who decide on the logic of the callback of ShouldReplace(). If this caller decide to replace "this" by "New" even in global scope (which is probably an initializer to global), Value::replaceUsesWithIf() should honor it.
This is simply a Constant handling code borrowed from replaceAllUsesWith(). See line 514 above.

================
Comment at: llvm/lib/Target/AMDGPU/Utils/AMDGPULDSUtils.cpp:39
+    return all_of(U->users(), [F](const User *U) {
+      return isUsedOnlyFromFunction(U, F);
+    });
----------------
hsmhsm wrote:
> hsmhsm wrote:
> > For me, this logic looks incomplete in general sense. Constants are constants throughout compilation phase, meaning, once a constant is created, it is reused in multiple contexts when required. So this is possible case.
> > 
> > void kern_0() {
> >   Inst_1(Const_1(Const_3(LDS)));
> > }
> > 
> > void kern_1() {
> >   Inst_1(Const_2(Const_3(LDS)));
> > }
> > 
> > Here Inst_1 and Inst_2 are two different instances, even if they are exactly same, but Const_3 is one-and-the same for both kern_0() and kern_1().  Will this case works here?
> Further, I verified that below test-case misses lowering because of above incomplete logic, but lowering is required here.
> 
> ```
> @kern = addrspace(3) global float undef, align 4
> 
> define amdgpu_kernel void @k0() {
>   %ld = load i32, i32* inttoptr (i64 add (i64 ptrtoint (i32* addrspacecast (i32 addrspace(3)* bitcast (float addrspace(3)* @kern to i32 addrspace(3)*) to i32*) to i64), i64 ptrtoint (i32* addrspacecast (i32 addrspace(3)* bitcast (float addrspace(3)* @kern to i32 addrspace(3)*) to i32*) to i64)) to i32*), align 4
>   ret void
> }
> 
> define amdgpu_kernel void @k1() {
>   %ld = load i32, i32* inttoptr (i64 add (i64 ptrtoint (i32* addrspacecast (i32 addrspace(3)* bitcast (float addrspace(3)* @kern to i32 addrspace(3)*) to i32*) to i64), i64 ptrtoint (i32* addrspacecast (i32 addrspace(3)* bitcast (float addrspace(3)* @kern to i32 addrspace(3)*) to i32*) to i64)) to i32*), align 4
>   ret void
> }
> ```
Right. This is exactly the case checked in the lower-kernel-lds-constexpr.ll, k0 and k1. Two kernels use the same constant and that cannot be lowered because we cannot replace the constant in one kernel without affecting another one. Handling of this case would require a transformation replacing a constant with an instruction. That is possible but not trivial and not handled in this pass.

At the same time lowering is NOT required. A safe action for kernel LDS lowering is to bail and not touch anything. This is optimization and not required for correct codegen.

================
Comment at: llvm/lib/Target/AMDGPU/Utils/AMDGPULDSUtils.cpp:39
+    return all_of(U->users(), [F](const User *U) {
+      return isUsedOnlyFromFunction(U, F);
+    });
----------------
rampitec wrote:
> hsmhsm wrote:
> > hsmhsm wrote:
> > > For me, this logic looks incomplete in general sense. Constants are constants throughout compilation phase, meaning, once a constant is created, it is reused in multiple contexts when required. So this is possible case.
> > > 
> > > void kern_0() {
> > >   Inst_1(Const_1(Const_3(LDS)));
> > > }
> > > 
> > > void kern_1() {
> > >   Inst_1(Const_2(Const_3(LDS)));
> > > }
> > > 
> > > Here Inst_1 and Inst_2 are two different instances, even if they are exactly same, but Const_3 is one-and-the same for both kern_0() and kern_1().  Will this case works here?
> > Further, I verified that below test-case misses lowering because of above incomplete logic, but lowering is required here.
> > 
> > ```
> > @kern = addrspace(3) global float undef, align 4
> > 
> > define amdgpu_kernel void @k0() {
> >   %ld = load i32, i32* inttoptr (i64 add (i64 ptrtoint (i32* addrspacecast (i32 addrspace(3)* bitcast (float addrspace(3)* @kern to i32 addrspace(3)*) to i32*) to i64), i64 ptrtoint (i32* addrspacecast (i32 addrspace(3)* bitcast (float addrspace(3)* @kern to i32 addrspace(3)*) to i32*) to i64)) to i32*), align 4
> >   ret void
> > }
> > 
> > define amdgpu_kernel void @k1() {
> >   %ld = load i32, i32* inttoptr (i64 add (i64 ptrtoint (i32* addrspacecast (i32 addrspace(3)* bitcast (float addrspace(3)* @kern to i32 addrspace(3)*) to i32*) to i64), i64 ptrtoint (i32* addrspacecast (i32 addrspace(3)* bitcast (float addrspace(3)* @kern to i32 addrspace(3)*) to i32*) to i64)) to i32*), align 4
> >   ret void
> > }
> > ```
> Right. This is exactly the case checked in the lower-kernel-lds-constexpr.ll, k0 and k1. Two kernels use the same constant and that cannot be lowered because we cannot replace the constant in one kernel without affecting another one. Handling of this case would require a transformation replacing a constant with an instruction. That is possible but not trivial and not handled in this pass.
> 
> At the same time lowering is NOT required. A safe action for kernel LDS lowering is to bail and not touch anything. This is optimization and not required for correct codegen.
> For me, this logic looks incomplete in general sense. Constants are constants throughout compilation phase, meaning, once a constant is created, it is reused in multiple contexts when required. So this is possible case.
> 
> void kern_0() {
>   Inst_1(Const_1(Const_3(LDS)));
> }
> 
> void kern_1() {
>   Inst_1(Const_2(Const_3(LDS)));
> }
> 
> Here Inst_1 and Inst_2 are two different instances, even if they are exactly same, but Const_3 is one-and-the same for both kern_0() and kern_1().  Will this case works here?

Yes, this is the situation handled by handleOperandChange(), that is why a normal setUse() cannot be used in the first place. Added test k3 to the lower-kernel-lds-constexpr.ll.

================
Comment at: llvm/lib/Target/AMDGPU/Utils/AMDGPULDSUtils.cpp:47
+bool shouldLowerLDSToStruct(const SmallPtrSetImpl<GlobalValue *> &UsedList,
+                            const GlobalVariable &GV, const Function *F) {
   // Any LDS variable can be lowered by moving into the created struct
----------------
hsmhsm wrote:
> This is fine, but combining the testing of two different requirements for both module lowering and kernel lowering within same function looks maintenance headache to me. May be it is useful to separate them into two different functions, and findVariablesToLower() call one of them, based on F. But it is upto you. I am fine either way.
> This is fine, but combining the testing of two different requirements for both module lowering and kernel lowering within same function looks maintenance headache to me. May be it is useful to separate them into two different functions, and findVariablesToLower() call one of them, based on F. But it is upto you. I am fine either way.

I did not want to clone practically identical code. I hope after a series of changes there will be no module LDS at all, it will all go into a per-kernel allocation.

================
Comment at: llvm/test/CodeGen/AMDGPU/lower-module-lds-constantexpr.ll:44
 define amdgpu_kernel void @timestwo() {
   %ld = load i32, i32* inttoptr (i64 add (i64 ptrtoint (i32* addrspacecast (i32 addrspace(3)* bitcast (float addrspace(3)* @both to i32 addrspace(3)*) to i32*) to i64), i64 ptrtoint (i32* addrspacecast (i32 addrspace(3)* bitcast (float addrspace(3)* @kern to i32 addrspace(3)*) to i32*) to i64)) to i32*), align 4
   %mul = mul i32 %ld, 2
----------------
hsmhsm wrote:
> `@kern` is not lowered here, and same below at line 46.
What do you mean? @kern is lowered. This is input IR, the output does not have it.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D102954/new/

https://reviews.llvm.org/D102954