[llvm] [WebAssembly] [Backend] Combine and(X, shuffle(X, pow 2 mask)) to all true (PR #145108)

Wed Jun 25 05:17:14 PDT 2025

lukel97 wrote:

>From the godbolt initially reported in #129441, if we add explicit casts we get much more sensible LLVM IR with the correct type: https://godbolt.org/z/4Gf7xaf3x

I.e.

```c
bool bar(__i16x8 a) {
    __i16x8 zero = wasm_i8x16_splat(0);
    return __builtin_reduce_and((__i16x8)wasm_i8x16_ne(a, zero));
}
```

Gives

```llvm
define hidden zeroext i1 @bar(<8 x i16> noundef %a) local_unnamed_addr #0 !dbg !26 {
entry:
    #dbg_value(<8 x i16> %a, !32, !DIExpression(), !34)
    #dbg_value(<4 x i32> zeroinitializer, !33, !DIExpression(), !34)
  %0 = bitcast <8 x i16> %a to <16 x i8>, !dbg !35
  %cmp.i = icmp ne <16 x i8> %0, zeroinitializer, !dbg !35
  %sext.i = sext <16 x i1> %cmp.i to <16 x i8>, !dbg !35
  %1 = bitcast <16 x i8> %sext.i to <8 x i16>, !dbg !36
  %rdx.and = tail call i16 @llvm.vector.reduce.and.v8i16(<8 x i16> %1), !dbg !37
  %tobool = icmp ne i16 %rdx.and, 0, !dbg !37
  ret i1 %tobool, !dbg !38
}
```

However this still doesn't emit all_true and the vector.reduce.and gets expanded to a bunch of shuffles in ExpandReductions.cpp.

I think there's two issues here: the intrinsics shouldn't be casting everything to v4i32 presumably?

And separately we should be able to emit all_true for the LLVM IR above.

Can we change this PR so that it solves the second issue, i.e. handling the above test case by implementing `TargetTransformInfoImpl::shouldExpandReduction` for WebAssembly?

https://github.com/llvm/llvm-project/pull/145108