[PATCH] D109594: [AMDGPU] Initialize LDS pointers after alloca, but before call.

Mon Sep 13 04:13:34 PDT 2021

JonChesterfield added a comment.

In D109594#2997110 <https://reviews.llvm.org/D109594#2997110>, @foad wrote:

> It's a QOI thing. You want to try hard to leave allocas in the entry block if possible, because LLVM convention is that allocas in the entry block are static...

That sounds right. However, this transform, by moving calls out of the entry block, will itself have that effect if those calls are inlined.

Alternatives are to emit the stores (probably as relaxed atomic) in the entry block, such that every lane executes it but we don't split the CFG, or to add a fairly late pass that hoists alloca into the entry bb.

I'd be inclined to do both. Hoisting 'dynamic' alloca into entry will fix some miscompilation (I haven't looked recently, but ~ 6 months ago alloca outside of entry was an error in the backend) and/or make things faster. Emitting the store from all lanes instead of branching means, well, less branching, but also we don't rearrange the entry block into a CFG.

If an atomic store of a uniform value is better expressed as masking off all lanes but one, I suspect we're better off doing that transform once exec is available for manipulation. Somewhere in MIR.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109594/new/

https://reviews.llvm.org/D109594