[llvm] [AArch64][GlobalISel] Add legalization for vecreduce.fmul (PR #73309)

Sat Nov 25 00:36:06 PST 2023

================
@@ -977,6 +977,19 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST)
       .clampMaxNumElements(1, s32, 2)
       .lower();
 
+  // For fmul reductions we need to split up into individual operations. We
+  // clamp to 128 bit vectors then to 64bit vectors to produce a cascade of
+  // smaller types, followed by scalarizing what remains.
----------------
davemgreen wrote:

That seemed to split directly into more 64bit operations - not the 128bit followed by 64bit that we want.

i.e. it produces something like this:
```
define float @mul_2S(<8 x float> %bin.rdx)  {
; CHECK-SD-LABEL: mul_2S:
; CHECK-SD:       // %bb.0:
; CHECK-SD-NEXT:    fmul v0.4s, v0.4s, v1.4s
; CHECK-SD-NEXT:    ext v1.16b, v0.16b, v0.16b, #8
; CHECK-SD-NEXT:    fmul v0.2s, v0.2s, v1.2s
; CHECK-SD-NEXT:    fmul s0, s0, v0.s[1]
; CHECK-SD-NEXT:    ret
;
; CHECK-GI-LABEL: mul_2S:
; CHECK-GI:       // %bb.0:
; CHECK-GI-NEXT:    mov d2, v0.d[1]
; CHECK-GI-NEXT:    mov d3, v1.d[1]
; CHECK-GI-NEXT:    fmul v0.2s, v0.2s, v2.2s
; CHECK-GI-NEXT:    fmul v1.2s, v1.2s, v3.2s
; CHECK-GI-NEXT:    fmul v0.2s, v0.2s, v1.2s
; CHECK-GI-NEXT:    mov s1, v0.s[1]
; CHECK-GI-NEXT:    fmul s0, s0, s1
; CHECK-GI-NEXT:    ret
  %r = call fast float @llvm.vector.reduce.fmul.f32.v8f32(float 1.0, <8 x float> %bin.rdx)
  ret float %r
}
```

https://github.com/llvm/llvm-project/pull/73309