[libc-commits] [clang] [libc] [Clang] Improve scan in gpuintrin.h (PR #189381)
Matt Arsenault via libc-commits
libc-commits at lists.llvm.org
Mon Mar 30 07:04:28 PDT 2026
================
@@ -213,7 +213,7 @@ __gpu_shuffle_idx_f64(uint64_t __lane_mask, uint32_t __idx, double __x,
__type __x) { \
uint64_t __above = __lane_mask & -(2ull << __gpu_lane_id()); \
for (uint32_t __step = 1; __step < __gpu_num_lanes(); __step *= 2) { \
- uint32_t __src = __above ? __builtin_ctzg(__above) : __gpu_lane_id(); \
+ uint32_t __src = __builtin_ctzg(__above); \
----------------
arsenm wrote:
You should use the bitwidth. That's what the optimizer understands best how to turn into the poison on 0 case. The codegen for -1 is worse in the 64-bit case
https://github.com/llvm/llvm-project/pull/189381
More information about the libc-commits
mailing list