[llvm] [VectorCombine] Fold vector sign-bit checks (PR #175194)
Valeriy Savchenko via llvm-commits
llvm-commits at lists.llvm.org
Sun Jan 11 05:31:26 PST 2026
================
@@ -3806,6 +3807,216 @@ bool VectorCombine::foldCastFromReductions(Instruction &I) {
return true;
}
+/// Fold:
+/// icmp pred (reduce.{add,or,and,umax,umin}(signbit_extract(x))), C
+/// into:
+/// icmp sgt/slt (reduce.{or,umax,and,umin}(x)), -1/0
+///
+/// Sign-bit reductions produce values with known semantics:
+/// - reduce.{or,umax}: 0 if no element is negative, 1 if any is
+/// - reduce.{and,umin}: 1 if all elements are negative, 0 if any isn't
+/// - reduce.add: count of negative elements (0 to NumElts)
----------------
SavchenkoValeriy wrote:
While agree with this in general, it's quite common in practice because platforms like AArch64 don't have horizontal `or` and thus horizontal `add` becomes a viable alternative.
As I mentioned earlier in my other patch: https://github.com/llvm/llvm-project/pull/173069#issuecomment-3676547050 that pattern of implementing `_mm_movemask_ps` for NEON. Then that implementation that might look something like this:
```
uint32x4_t input = vreinterpretq_u32_m128(a);
static const int32x4_t shift = {0, 1, 2, 3};
uint32x4_t tmp = vshrq_n_u32(input, 31);
return vaddvq_u32(vshlq_u32(tmp, shift));
```
gets inlined in various places producing quite inefficient signedness check in computationally heavy applications.
Combined with #173069 this change optimizes it away. And it is exactly the `add` version that gives us a substantial boost on the internal benchmarks.
https://github.com/llvm/llvm-project/pull/175194
More information about the llvm-commits
mailing list