[clang] [llvm] [llvm][AMDGPU] Fold `llvm.amdgcn.wavefrontsize` early (PR #114481)

Jon Chesterfield via llvm-commits llvm-commits at lists.llvm.org
Thu Nov 7 09:21:36 PST 2024


JonChesterfield wrote:

Awesome! This is absolutely something that has been on my todo stack for ages and it's very good to see someone else writing the thing.

It looks like the implementation is contentious so I'll leave that for the moment. Under some time constraints so please forgive the length of the following - TLDR is I love this and definitely want the feature.

A magic intrinsic which can be summoned from clang and used in things like if statements is the right construct for all the stuff which we known by codegen time and don't really need to care about in the front end. The code object version should probably be reified as one. The number of compute units. All the things currently handled by magic global variables in the rocm device library and relying on O1 to constant fold them out of existence. We do this, throw away the magic globals, everything is better.

The magic intrinsic can have _guaranteed_ constant folding even at O0. That kills that class of O0 doesn't work bugs. This means it needs something that is certain to remove it ahead of a simplifycfg pass, or for the pass which removes it to _also_ do the trivial fold of a branch. So it's not just constant folding, though having instcombine also constant fold it is fine as an optimisation.

The real value here in my opinion is towards being able to write IR libraries that don't know or care what target they're going to run on. Either because they're associated with spir-v, or because they're the libc which currently handwaves that problem, or the rocm device libs which handwaves it in a slightly different way, or the openmp runtime which currently builds K identical copies of the bitcode with different names in the spirit of correctness and stashes them in an archive. Lots of that is on sketchy QoI ground at present. But if we have an wavesize intrinsic that turns into 32 or 64 once the target is known, and hangs around in the IR until some later information about the target is revealed, we can have single IR and predictable correct semantics. Much better than the status quo.

Thanks Alex!

https://github.com/llvm/llvm-project/pull/114481


More information about the llvm-commits mailing list