[llvm] [AMDGPU] Eliminate likely-spurious execz checks (PR #117567)

Fabian Ritter via llvm-commits llvm-commits at lists.llvm.org
Tue Nov 26 05:46:10 PST 2024


ritter-x2a wrote:

@jayfoad 
An example for where the execz branch **should** be eliminated would be a store in every even thread:

```cuda
__global__ void foo(int *buf, size_t n) {
    unsigned tid = blockDim.x * blockIdx.x + threadIdx.x;
    if (tid % 2 == 0)
        buf[tid] = 0;
}
```

An example where the execz branch **should not** be eliminated would be a sequence of conditional accesses where the condition is loaded from memory (the current implementation does not respect that):

```cuda
__global__ void bar(int *buf, size_t n, bool *lookup) {
    unsigned tid = blockDim.x * blockIdx.x + threadIdx.x;
    if (lookup[tid]) {
        buf[tid] = 0;
        // more memory accesses...
    }
}
```

Checking if the branch condition is the result of bitwise and arithmetic operations that involve a workitem id (i.e. only continuing to check the operands of a User if it is such an instruction) would be a plausible heuristic for that (that would benefit from benchmarking).

@krzysz00 might have more practical examples.

https://github.com/llvm/llvm-project/pull/117567


More information about the llvm-commits mailing list