[PATCH] D109594: [AMDGPU] Initialize LDS pointers after alloca, but before call.

Tue Sep 14 03:40:40 PDT 2021

hsmhsm added a comment.

In D109594#2999182 <https://reviews.llvm.org/D109594#2999182>, @JonChesterfield wrote:

> In D109594#2997752 <https://reviews.llvm.org/D109594#2997752>, @hsmhsm wrote:
>
>> Often, (probably because of too many things on the one's plate) we tend to forget what we discussed and finalized in the past except one who actually works on it.
>>
>> After lengthy discussion of possible ways of initializing LDS pointers at the entry block of kernels (including relaxed atomic one) we finalized about 0th lane from each wave to do initialization (which is nothing but the stores that you indicated above). So I did not take the route of relaxed atomic approach.
>
> This doesn't sound right. I remember a preference for dropping to lane 0 on power efficiency grounds. I also remember a proposal to do that by modifying the exec mask in IR. I don't recall any discussion of the costs of splitting the entry block, or a discussion of the consequences of that if calls later in the entry block are subsequently inlined. The previous round of apply/fail/revert, and how you're trying to work around it here, suggests that is indeed a hazard.
>
> Instead of adding, in your words, "temporary hacks", let's fix the alloca lowering so it cannot break us here or elsewhere. By iterating over the function and moving all alloca into the entry block, or by doing the call frame setup thing in the back end. Just start from the error message from llc and work backwards.
>
> I'd also prefer we make the branch vs power tradeoff in MIR instead of IR as we generally try to avoid splitting basic blocks (since doing so usually blocks optimisations), but care less about that than about not miscompiling openmp.

- Proposal to do lds pointer initialization by modifying the exec mask in (LLVM) IR is completely ruled out because it is a hacky stuff. At the IR level, there is no neat way of modifying and maintaining exec mask (please check with Tony), he has a clear opposition for it.
- I did not have an a priori understanding of splitting entry basic block (because we want only lane 0 to do initialization) will result in surprising results like this. Only after implementation and test failures, we realized it. So, discussing it a priori was ruled out.
- Fixing alloca lowering will be a completely separate task. And, I even do not know the complexity of this work at the moment. I also have my own doubt that if moving all alloca to the beginning of the entry basic block is so sample as you guys are mentioning here, why not anyone from LLVM community did not attempt it till date considering the importance of keeping alloca at the beginning of the entry basic block for better optimization?
- Moving to work on alloca means this LDS pass need to wait until then, and I am not sure if we really have that much luxury of time considering the importance of this pass. So, I came-up with this temporary patch until we clean-up alloca, which is a valid attempt in my opinion considering the importance of this pass. That said, if we all of us including higher technical management team is okay for this patch to wait for arbitrary time, I am fine with that too.
- This is an late LLVM IR pass, I am not getting how it is linked to MIR.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109594/new/

https://reviews.llvm.org/D109594