[PATCH] D109870: [AMDGPU] Enable the pass "amdgpu-replace-lds-use-with-pointer"

Wed Sep 22 09:16:08 PDT 2021

hsmhsm added a comment.

In D109870#3015473 <https://reviews.llvm.org/D109870#3015473>, @arsenm wrote:

> In D109870#3014083 <https://reviews.llvm.org/D109870#3014083>, @hsmhsm wrote:
>
>> At the moment, it is not *very clear* if LLVM IR with static alloca after call
>> is legal or not.
>
> Calls have absolutely nothing to do with allocas. Allocas may legally appear anywhere, this isn't in question. Allocas are strongly recommended to be placed in the entry block to enable optimizations. Further, clustering them at the top of the entry block is nicer IR. This is in no way a requirement, but an optimization pass that doesn't strictly need to touch every alloca doesn't need to find every alloca
>
>> In this pass, since we need to split the entry block before any call, any static
>> alloca after the call will be converted into dynamic alloca which we want to avoid.
>>
>> So, for now, we skip running this pass, if there a kernel which has static
>> alloca after call, and we will revisit this code later once we have clear clarity
>> on the placement of static alloca.
>
> This pass has no reason to inspect calls. What you care about is splitting the entry block to insert some initialization code. In the process of doing this, you want to avoid moving allocas out of the entry block. This is trivial to do if all the allocas are clustered at the top of the block. If you ignore allocas beyond the top cluster, there is nothing wrong with the IR. If you happen to sink some out of the entry block, it's still correct, just suboptimal.

Agreed cent percent.

Further, I have pushed CFE patch at  https://reviews.llvm.org/D110257 which fixes the problem where the contiguity of static allocas at the very top of the entry block was broken.  Once this patch makes it's way to trunk, all I need to do here in this patch is to just scan the very top static alloca cluster (as an optimization), and split the block just after the cluster.

If any other static alloca which is not part of the top cluster (for some in-efficient reason) happens to move out of entry block, and which exposes hidden bug in AMDGPU back-end, then it is the *not* the responsibility of this patch. It is a different issue to be fixed, and blaming this patch for it is being exposing the hidden bug is not at all acceptable anymore.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109870/new/

https://reviews.llvm.org/D109870