[llvm] [AArch64][SVE] Fold zero-extend into add reduction. (PR #102325)

Fri Aug 9 01:47:52 PDT 2024

davemgreen wrote:

Thanks for this. All the added test look good to me, but could this turn `i64 vecreduce(sext(v4i32))` into sve `saddv d0, p0, z0.s`, as opposed to neon `saddlv d0, v0.4s`? Can we get it to not do that one?
```
define i64 @add_v4i32_v4i64_sext(<4 x i32> %x) {
; CHECK-LABEL: add_v4i32_v4i64_sext:
; CHECK:       // %bb.0: // %entry
; CHECK-NEXT:    saddlv d0, v0.4s
; CHECK-NEXT:    fmov x0, d0
; CHECK-NEXT:    ret
entry:
  %xx = sext <4 x i32> %x to <4 x i64>
  %z = call i64 @llvm.vector.reduce.add.v4i64(<4 x i64> %xx)
  ret i64 %z
}
```

It might be worth trying the examples from `bin/llc -mtriple aarch64-none-eabi -mattr=+sve2 ../llvm/test/CodeGen/AArch64/vecreduce-add.ll -o -` compared to how they were before this patch. There are quite a few examples in that test file now.

https://github.com/llvm/llvm-project/pull/102325