[PATCH] D109594: [AMDGPU] Initialize LDS pointers after alloca, but before call.

Jon Chesterfield via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Sep 13 04:13:34 PDT 2021


JonChesterfield added a comment.

In D109594#2997110 <https://reviews.llvm.org/D109594#2997110>, @foad wrote:

> It's a QOI thing. You want to try hard to leave allocas in the entry block if possible, because LLVM convention is that allocas in the entry block are static...

That sounds right. However, this transform, by moving calls out of the entry block, will itself have that effect if those calls are inlined.

Alternatives are to emit the stores (probably as relaxed atomic) in the entry block, such that every lane executes it but we don't split the CFG, or to add a fairly late pass that hoists alloca into the entry bb.

I'd be inclined to do both. Hoisting 'dynamic' alloca into entry will fix some miscompilation (I haven't looked recently, but ~ 6 months ago alloca outside of entry was an error in the backend) and/or make things faster. Emitting the store from all lanes instead of branching means, well, less branching, but also we don't rearrange the entry block into a CFG.

If an atomic store of a uniform value is better expressed as masking off all lanes but one, I suspect we're better off doing that transform once exec is available for manipulation. Somewhere in MIR.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109594/new/

https://reviews.llvm.org/D109594



More information about the llvm-commits mailing list