[llvm] [VectorCombine] Fold vector sign-bit checks (PR #175194)

Sun Jan 11 05:31:26 PST 2026

================
@@ -3806,6 +3807,216 @@ bool VectorCombine::foldCastFromReductions(Instruction &I) {
   return true;
 }
 
+/// Fold:
+///   icmp pred (reduce.{add,or,and,umax,umin}(signbit_extract(x))), C
+/// into:
+///   icmp sgt/slt (reduce.{or,umax,and,umin}(x)), -1/0
+///
+/// Sign-bit reductions produce values with known semantics:
+///   - reduce.{or,umax}: 0 if no element is negative, 1 if any is
+///   - reduce.{and,umin}: 1 if all elements are negative, 0 if any isn't
+///   - reduce.add: count of negative elements (0 to NumElts)
----------------
SavchenkoValeriy wrote:

While agree with this in general, it's quite common in practice because platforms like AArch64 don't have horizontal `or` and thus horizontal `add` becomes a viable alternative.

As I mentioned earlier in my other patch: https://github.com/llvm/llvm-project/pull/173069#issuecomment-3676547050 that pattern of implementing `_mm_movemask_ps` for NEON. Then that implementation that might look something like this:
```
    uint32x4_t input = vreinterpretq_u32_m128(a);
    static const int32x4_t shift = {0, 1, 2, 3};
    uint32x4_t tmp = vshrq_n_u32(input, 31);
    return vaddvq_u32(vshlq_u32(tmp, shift));
```
gets inlined in various places producing quite inefficient signedness check in computationally heavy applications.

Combined with #173069 this change optimizes it away. And it is exactly the `add` version that gives us a substantial boost on the internal benchmarks.

https://github.com/llvm/llvm-project/pull/175194