[libc-commits] [clang] [libc] [Clang] Improve scan in gpuintrin.h (PR #189381)

Matt Arsenault via libc-commits libc-commits at lists.llvm.org
Mon Mar 30 07:04:28 PDT 2026


================
@@ -213,7 +213,7 @@ __gpu_shuffle_idx_f64(uint64_t __lane_mask, uint32_t __idx, double __x,
                                             __type __x) {                      \
     uint64_t __above = __lane_mask & -(2ull << __gpu_lane_id());               \
     for (uint32_t __step = 1; __step < __gpu_num_lanes(); __step *= 2) {       \
-      uint32_t __src = __above ? __builtin_ctzg(__above) : __gpu_lane_id();    \
+      uint32_t __src = __builtin_ctzg(__above);                                \
----------------
arsenm wrote:

You should use the bitwidth. That's what the optimizer understands best how to turn into the poison on 0 case. The codegen for -1 is worse in the 64-bit case 

https://github.com/llvm/llvm-project/pull/189381


More information about the libc-commits mailing list