[llvm] [AMDGPU] Eliminate likely-spurious execz checks (PR #117567)
Fabian Ritter via llvm-commits
llvm-commits at lists.llvm.org
Tue Nov 26 05:46:10 PST 2024
ritter-x2a wrote:
@jayfoad
An example for where the execz branch **should** be eliminated would be a store in every even thread:
```cuda
__global__ void foo(int *buf, size_t n) {
unsigned tid = blockDim.x * blockIdx.x + threadIdx.x;
if (tid % 2 == 0)
buf[tid] = 0;
}
```
An example where the execz branch **should not** be eliminated would be a sequence of conditional accesses where the condition is loaded from memory (the current implementation does not respect that):
```cuda
__global__ void bar(int *buf, size_t n, bool *lookup) {
unsigned tid = blockDim.x * blockIdx.x + threadIdx.x;
if (lookup[tid]) {
buf[tid] = 0;
// more memory accesses...
}
}
```
Checking if the branch condition is the result of bitwise and arithmetic operations that involve a workitem id (i.e. only continuing to check the operands of a User if it is such an instruction) would be a plausible heuristic for that (that would benefit from benchmarking).
@krzysz00 might have more practical examples.
https://github.com/llvm/llvm-project/pull/117567
More information about the llvm-commits
mailing list