[PATCH] D102954: [AMDGPU] Lower kernel LDS into a sorted structure

Mahesha S via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon May 24 21:24:20 PDT 2021


hsmhsm added inline comments.


================
Comment at: llvm/lib/Target/AMDGPU/Utils/AMDGPULDSUtils.cpp:39
+    return all_of(U->users(), [F](const User *U) {
+      return isUsedOnlyFromFunction(U, F);
+    });
----------------
rampitec wrote:
> rampitec wrote:
> > hsmhsm wrote:
> > > hsmhsm wrote:
> > > > For me, this logic looks incomplete in general sense. Constants are constants throughout compilation phase, meaning, once a constant is created, it is reused in multiple contexts when required. So this is possible case.
> > > > 
> > > > void kern_0() {
> > > >   Inst_1(Const_1(Const_3(LDS)));
> > > > }
> > > > 
> > > > void kern_1() {
> > > >   Inst_1(Const_2(Const_3(LDS)));
> > > > }
> > > > 
> > > > Here Inst_1 and Inst_2 are two different instances, even if they are exactly same, but Const_3 is one-and-the same for both kern_0() and kern_1().  Will this case works here?
> > > Further, I verified that below test-case misses lowering because of above incomplete logic, but lowering is required here.
> > > 
> > > ```
> > > @kern = addrspace(3) global float undef, align 4
> > > 
> > > define amdgpu_kernel void @k0() {
> > >   %ld = load i32, i32* inttoptr (i64 add (i64 ptrtoint (i32* addrspacecast (i32 addrspace(3)* bitcast (float addrspace(3)* @kern to i32 addrspace(3)*) to i32*) to i64), i64 ptrtoint (i32* addrspacecast (i32 addrspace(3)* bitcast (float addrspace(3)* @kern to i32 addrspace(3)*) to i32*) to i64)) to i32*), align 4
> > >   ret void
> > > }
> > > 
> > > define amdgpu_kernel void @k1() {
> > >   %ld = load i32, i32* inttoptr (i64 add (i64 ptrtoint (i32* addrspacecast (i32 addrspace(3)* bitcast (float addrspace(3)* @kern to i32 addrspace(3)*) to i32*) to i64), i64 ptrtoint (i32* addrspacecast (i32 addrspace(3)* bitcast (float addrspace(3)* @kern to i32 addrspace(3)*) to i32*) to i64)) to i32*), align 4
> > >   ret void
> > > }
> > > ```
> > Right. This is exactly the case checked in the lower-kernel-lds-constexpr.ll, k0 and k1. Two kernels use the same constant and that cannot be lowered because we cannot replace the constant in one kernel without affecting another one. Handling of this case would require a transformation replacing a constant with an instruction. That is possible but not trivial and not handled in this pass.
> > 
> > At the same time lowering is NOT required. A safe action for kernel LDS lowering is to bail and not touch anything. This is optimization and not required for correct codegen.
> > For me, this logic looks incomplete in general sense. Constants are constants throughout compilation phase, meaning, once a constant is created, it is reused in multiple contexts when required. So this is possible case.
> > 
> > void kern_0() {
> >   Inst_1(Const_1(Const_3(LDS)));
> > }
> > 
> > void kern_1() {
> >   Inst_1(Const_2(Const_3(LDS)));
> > }
> > 
> > Here Inst_1 and Inst_2 are two different instances, even if they are exactly same, but Const_3 is one-and-the same for both kern_0() and kern_1().  Will this case works here?
> 
> Yes, this is the situation handled by handleOperandChange(), that is why a normal setUse() cannot be used in the first place. Added test k3 to the lower-kernel-lds-constexpr.ll.
Why do we need a transformation replacing a constant with an instruction? I am not sure. 

**My understanding is this** - 

In the above example, defintely `@kern` is suppose to be lowered both within kernels `@k0` and `@k1`.  But since the struct offset for `@kern`  would be different within both the kernels (assuming they also use other LDS globals), the current lowering flow - that is handling one kernel at a time is the limitation here as I understand it.

**Instead let's assume below flow, then it is possible to achieve it. But, only problem is, this flow is overkill since we need to do some heavy work here which is bit against LLVM infrastructure.**

```
1.  ConsolidatedGlobals = nullptr

2.  FOR (each kernel K) DO
        LDSGlobals = Collect LDS globals to be lowered
        Create struct type and struct instance for K, and save it.
        Set union LDSGlobals with ConsolidatedGlobals
     ENDFOR

3.   Call removeFromUsedList for ConsolidatedGlobals

4.   Construct an indexing table which given an LDS and a Kernel, it returns the corresponding struct offset.
 
5.   FOR (each lds global GV from ConsolidatedGlobals) DO
         FOR (each kernel K) DO
              Offset = Get offset for the pair (GV, K) from indexing table.
              GEF = Create a GEF constant expression using Offset.
              KernelUses = Collect all uses of GV within kernel K.
              FOR (each use U from KernelUses) DO
                  Replace GV within U by GEF
              ENDFOR
         ENDFOR
      ENDFOR

6.   FOR (each global GV from ConsolidatedGlobals) DO
         Erase GV from module.
      ENDFOR
```


================
Comment at: llvm/test/CodeGen/AMDGPU/lower-module-lds-constantexpr.ll:44
 define amdgpu_kernel void @timestwo() {
   %ld = load i32, i32* inttoptr (i64 add (i64 ptrtoint (i32* addrspacecast (i32 addrspace(3)* bitcast (float addrspace(3)* @both to i32 addrspace(3)*) to i32*) to i64), i64 ptrtoint (i32* addrspacecast (i32 addrspace(3)* bitcast (float addrspace(3)* @kern to i32 addrspace(3)*) to i32*) to i64)) to i32*), align 4
   %mul = mul i32 %ld, 2
----------------
rampitec wrote:
> hsmhsm wrote:
> > `@kern` is not lowered here, and same below at line 46.
> What do you mean? @kern is lowered. This is input IR, the output does not have it.
My bad, I misunderstood it, please ignore it.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D102954/new/

https://reviews.llvm.org/D102954



More information about the llvm-commits mailing list