[llvm] [AMDGPU] Introduce "amdgpu-uniform-intrinsic-combine" pass to combine uniform AMDGPU lane Intrinsics. (PR #116953)

Wed Sep 3 08:51:47 PDT 2025

https://github.com/nhaehnle commented:

We've had an offline discussion about this, and I'm confident that what the transforms that this pass wants to do are correct, for reasons that were more or less already stated in the discussion.

It's true that this pass has transforms that can lead to a situation where some instruction whose operand was previously recognized as statically uniform is later on no longer recognized as statically uniform. However, the semantics of how programs execute don't (and must not, for this precise reason[0]) care about static uniformity, they only ever care about dynamic uniformity. And every instruction that's downstream and cares about dynamic uniformity must be convergent (and isel will introduce v_readfirstlane for them if their operands can't be proven statically uniform).

So correctness really only requires one observation: For all transforms in this pass, it is intuitively clear and should be possible to formally prove[1] locally that given same initial state before the transformed code, the state after the transformed code is the same as it is for the original program.

There could still be unintended negative performance side effects (e.g. longer live ranges of VGPRs vs. SGPRs), but I don't see any fundamental correctness issues.

What does make me uneasy about the pass as it is currently written is that it interleaves changes to the IR with queries from UniformityInfo, and so later queries might get stale results. This is more of an engineering issue though and not a fundamental problem.

What I would suggest to solve this is to systematically split the pass into two halves so that all UniformityInfo queries happen before all changes to the IR. The first half iterates over intrinsic instructions, checking UniformityInfo and building a list of all the instructions that can be transformed, and the second half simply goes over the list and applies the changes.

An alternative would be to make the UniformInfo updatable. I think that's a good idea either way, but you may not want to make it a prerequisite for this change.

[0] We already have transforms that can do this. The main one that comes to mind is where you have code like:
```
y = readfirstlane(x)
if (x == y) {
  use(y)
}
```
... and a well-intentioned but ultimately misguided generic transform replaces the use of `y` with a use of `x`.

[1] I'm not actually versed enough in the relevant PL theory. There's a valiant approach by Pankaj to write down a proper proof, and while I believe that proof is correct intuitively, I'm uneasy about the formal details. If we did want to get to a proper formal theory, I suspect the way to go would be some further development of separation logic that can talk about convergence.

https://github.com/llvm/llvm-project/pull/116953