[llvm] [AMDGPU] Eliminate likely-spurious execz checks (PR #117567)
Jay Foad via llvm-commits
llvm-commits at lists.llvm.org
Tue Nov 26 06:07:43 PST 2024
jayfoad wrote:
> An example for where the execz branch **should** be eliminated would be a store in every even thread:
>
> ```cuda-c++
> __global__ void foo(int *buf, size_t n) {
> unsigned tid = blockDim.x * blockIdx.x + threadIdx.x;
> if (tid % 2 == 0)
> buf[tid] = 0;
> }
> ```
Agreed.
> An example where the execz branch **should not** be eliminated would be a sequence of conditional accesses where the condition is loaded from memory (the current implementation does not respect that):
>
> ```cuda-c++
> __global__ void bar(int *buf, size_t n, bool *lookup) {
> unsigned tid = blockDim.x * blockIdx.x + threadIdx.x;
> if (lookup[tid]) {
> buf[tid] = 0;
> // more memory accesses...
> }
> }
> ```
Not so sure about this. The compiler has no information about the contents of `lookup[tid]`, so no idea how likely it is that the condition will be false for every lane, correct?
https://github.com/llvm/llvm-project/pull/117567
More information about the llvm-commits
mailing list