[llvm] [LoopVectorizer][AArch64] Add support for partial reduce subtraction (PR #123636)

Fri Feb 7 06:31:32 PST 2025

================
@@ -632,7 +632,7 @@ define i32 @chained_partial_reduce_sub_sub(ptr %a, ptr %b, ptr %c, i32 %N) #0 {
 ; CHECK-SVE-MAXBW-NEXT:    [[TMP17:%.*]] = sub nsw <vscale x 8 x i32> zeroinitializer, [[TMP16]]
 ; CHECK-SVE-MAXBW-NEXT:    [[PARTIAL_REDUCE:%.*]] = call <vscale x 2 x i32> @llvm.experimental.vector.partial.reduce.add.nxv2i32.nxv8i32(<vscale x 2 x i32> [[VEC_PHI]], <vscale x 8 x i32> [[TMP17]])
 ; CHECK-SVE-MAXBW-NEXT:    [[TMP18:%.*]] = mul nsw <vscale x 8 x i32> [[TMP13]], [[TMP15]]
-; CHECK-SVE-MAXBW-NEXT:    [[TMP19:%.*]] = sub nsw <vscale x 8 x i32> zeroinitializer, [[TMP18]]
+; CHECK-SVE-MAXBW-NEXT:    [[TMP19:%.*]] = sub <vscale x 8 x i32> zeroinitializer, [[TMP18]]
----------------
NickGuy-Arm wrote:

In this case, they're not being discarded so much as they simply don't exist. This is the sub that is being created by matching an existing sub instruction and replacing it with `add(%a, sub(0, %b))`. I'm also not sure that the flags will do anything in this case, as the only time it would overflow is if TMP18 had already overflowed, but in that case would be a poison value.

I'm happy to add them if they're deemed necessary/helpful though.

https://github.com/llvm/llvm-project/pull/123636