[PATCH] D109594: [AMDGPU] Initialize LDS pointers after alloca, but before call.

Tue Sep 14 01:17:26 PDT 2021

JonChesterfield added a comment.

In D109594#2997752 <https://reviews.llvm.org/D109594#2997752>, @hsmhsm wrote:

> Often, (probably because of too many things on the one's plate) we tend to forget what we discussed and finalized in the past except one who actually works on it.
>
> After lengthy discussion of possible ways of initializing LDS pointers at the entry block of kernels (including relaxed atomic one) we finalized about 0th lane from each wave to do initialization (which is nothing but the stores that you indicated above). So I did not take the route of relaxed atomic approach.

This doesn't sound right. I remember a preference for dropping to lane 0 on power efficiency grounds. I also remember a proposal to do that by modifying the exec mask in IR. I don't recall any discussion of the costs of splitting the entry block, or a discussion of the consequences of that if calls later in the entry block are subsequently inlined. The previous round of apply/fail/revert, and how you're trying to work around it here, suggests that is indeed a hazard.

Instead of adding, in your words, "temporary hacks", let's fix the alloca lowering so it cannot break us here or elsewhere. By iterating over the function and moving all alloca into the entry block, or by doing the call frame setup thing in the back end. Just start from the error message from llc and work backwards.

I'd also prefer we make the branch vs power tradeoff in MIR instead of IR as we generally try to avoid splitting basic blocks (since doing so usually blocks optimisations), but care less about that than about not miscompiling openmp.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109594/new/

https://reviews.llvm.org/D109594