[llvm] [AMDGPU] Eliminate likely-spurious execz checks (PR #117567)

Jay Foad via llvm-commits llvm-commits at lists.llvm.org
Tue Nov 26 06:07:43 PST 2024


jayfoad wrote:

> An example for where the execz branch **should** be eliminated would be a store in every even thread:
> 
> ```cuda-c++
> __global__ void foo(int *buf, size_t n) {
>     unsigned tid = blockDim.x * blockIdx.x + threadIdx.x;
>     if (tid % 2 == 0)
>         buf[tid] = 0;
> }
> ```

Agreed.

> An example where the execz branch **should not** be eliminated would be a sequence of conditional accesses where the condition is loaded from memory (the current implementation does not respect that):
> 
> ```cuda-c++
> __global__ void bar(int *buf, size_t n, bool *lookup) {
>     unsigned tid = blockDim.x * blockIdx.x + threadIdx.x;
>     if (lookup[tid]) {
>         buf[tid] = 0;
>         // more memory accesses...
>     }
> }
> ```

Not so sure about this. The compiler has no information about the contents of `lookup[tid]`, so no idea how likely it is that the condition will be false for every lane, correct?

https://github.com/llvm/llvm-project/pull/117567


More information about the llvm-commits mailing list