[llvm] [AMDGPU] Eliminate likely-spurious execz checks via intrinsic argument (PR #123749)
Fabian Ritter via llvm-commits
llvm-commits at lists.llvm.org
Thu Feb 6 03:30:08 PST 2025
ritter-x2a wrote:
I found a case where the heuristic fails in practice: When it's applied to conditionals introduced by [BypassSlowDivision](https://github.com/llvm/llvm-project/blob/013f4a46d1978e370f940df3cbd04fb0399a04fe/llvm/lib/Transforms/Utils/BypassSlowDivision.cpp) as part of the codegenprepare pass.
This transformation essentially replaces a `udiv i64 A, B` with `(do values fit in i32) ? (udiv i32 A, B) : (udiv i64 A, B)`. I'd guess that this condition usually doesn't diverge within a wavefront.
With the current heuristic, we would always "execute" (with potentially empty EXEC mask) the very lengthy code for i64 division and that for i32 division if the values depend on the thread id.
I have code where that costs 10% performance.
https://github.com/llvm/llvm-project/pull/123749
More information about the llvm-commits
mailing list