[llvm] [AMDGPU] Introduce "amdgpu-uniform-intrinsic-combine" pass to combine uniform AMDGPU lane Intrinsics. (PR #116953)

Tue Apr 15 18:24:41 PDT 2025

ssahasra wrote:

Tagging @nhaehnle 

I think I now understand the concern that @arsenm raised. At some static instance in a program, say an operation X uses an value defined by another operation Y. The uniformity of Y depends on its own input operands as well as the threads that are converged at Y, which in turn depends on the control paths that reach Y. If we make a decision at X which depends on Y being uniform, then effectively we are freezing the convergence at Y and choosing only those executions of the whole program where this convergence is preserved at Y. This information needs to be recorded somewhere (maybe at X or at Y or both). In particular, if X is "always uniform" as in the current optimization, then replacing X with Y means that all possible executions of the new program must guarantee that Y is also always uniform.

So we cannot just simply check for uniform values and optimize anything based on that. At every such optimization, we have to restrict the possible executions of the optimized program.

https://github.com/llvm/llvm-project/pull/116953