[PATCH] D103225: [AMDGPU] Replace non-kernel function uses of LDS globals by pointers.

Mahesha S via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sun Jun 13 23:37:38 PDT 2021


hsmhsm marked an inline comment as done.
hsmhsm added a comment.

In D103225#2810460 <https://reviews.llvm.org/D103225#2810460>, @b-sumner wrote:

> I believe that for the purposes of detecting lane 0 mbcnt_lo is sufficient.

Patch is **again** updated accordingly.

In D103225#2811100 <https://reviews.llvm.org/D103225#2811100>, @rampitec wrote:

> In D103225#2809846 <https://reviews.llvm.org/D103225#2809846>, @hsmhsm wrote:
>
>> Insert @llvm.amdgcn.mbcnt.hi() for wave64 mode.
>
> Actually I think you were right. mbcnt_lo will always return -1 for any lane higher than 31.

Now I again updated the patch - mbcnt_lo itself is enough to detect lane 0 even in wave64 mode.



================
Comment at: llvm/test/CodeGen/AMDGPU/replace-lds-by-ptr-call-diamond-shape.ll:15
+; Pointer should be created.
+; CHECK: @lds_used_within_func.ptr = internal unnamed_addr addrspace(3) global i16 undef, align 2
+
----------------
rampitec wrote:
> hsmhsm wrote:
> > rampitec wrote:
> > > This likely should not be i16. You could do it this way:
> > > ```
> > > @lds_used_within_func.ptr = internal unnamed_addr addrspace(3) global i8 addrspace(3)* undef, align 2
> > > ```
> > > Then use bitcast to i8 addrspace(3)* instead of ptrtoint on store.
> > > 
> > > But I still think that creating module lds right here is better.
> > The main goal of using i16 as pointer (with ptrtoint and GEP) is to save memory as much as possible. Using  "i8 addrspace(3)*" means we are going to allocate memory size for pointer (not sure if it is 4 or 8 bytes, but definitely greater than 2 bytes). Hence the use of i16 type here.
> > 
> > Also, I do not think, it is a good idea to create module lds right here. If that was the case, we could have put this code within module LDS pass itself instead of having separate pass (I did it initially, later we decided that pointer-replacement code should go as separate pass).
> > 
> > This pass is already too big, let's not change the design again and further complicate the stuff which will further delay code submit, and which leads to other serious problems.
> > The main goal of using i16 as pointer (with ptrtoint and GEP) is to save memory as much as possible. Using  "i8 addrspace(3)*" means we are going to allocate memory size for pointer (not sure if it is 4 or 8 bytes, but definitely greater than 2 bytes). Hence the use of i16 type here.
> 
> Should not sizeof local pointer be 2? Checked the DL: p3:32:32. Sigh!
> 
> > Also, I do not think, it is a good idea to create module lds right here. If that was the case, we could have put this code within module LDS pass itself instead of having separate pass (I did it initially, later we decided that pointer-replacement code should go as separate pass).
> > 
> > This pass is already too big, let's not change the design again and further complicate the stuff which will further delay code submit, and which leads to other serious problems.
> 
> My concern is that you assume these i16 pointers will start a null, but that might not be the case if you rely on the module LDS pass to pack them into the struct along with other LDS and let it sort it.
> 
> Can you add a test where you have these pointers and also another module LDS which will have higher alignment and size, so module LDS will pack them together? We could then inspect output of 2 passes and the ISA.
No, what you concern here is not at all arise. Please have a look the new test case - replace-lds-by-ptr-lds-offsets.ll. I have given detailed description within this test, and I hope, it will clarify your above doubt.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103225/new/

https://reviews.llvm.org/D103225



More information about the llvm-commits mailing list