[llvm] [AArch64][GlobalISel] Add legalization for vecreduce.fmul (PR #73309)
David Green via llvm-commits
llvm-commits at lists.llvm.org
Sat Nov 25 00:36:06 PST 2023
================
@@ -977,6 +977,19 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST)
.clampMaxNumElements(1, s32, 2)
.lower();
+ // For fmul reductions we need to split up into individual operations. We
+ // clamp to 128 bit vectors then to 64bit vectors to produce a cascade of
+ // smaller types, followed by scalarizing what remains.
----------------
davemgreen wrote:
That seemed to split directly into more 64bit operations - not the 128bit followed by 64bit that we want.
i.e. it produces something like this:
```
define float @mul_2S(<8 x float> %bin.rdx) {
; CHECK-SD-LABEL: mul_2S:
; CHECK-SD: // %bb.0:
; CHECK-SD-NEXT: fmul v0.4s, v0.4s, v1.4s
; CHECK-SD-NEXT: ext v1.16b, v0.16b, v0.16b, #8
; CHECK-SD-NEXT: fmul v0.2s, v0.2s, v1.2s
; CHECK-SD-NEXT: fmul s0, s0, v0.s[1]
; CHECK-SD-NEXT: ret
;
; CHECK-GI-LABEL: mul_2S:
; CHECK-GI: // %bb.0:
; CHECK-GI-NEXT: mov d2, v0.d[1]
; CHECK-GI-NEXT: mov d3, v1.d[1]
; CHECK-GI-NEXT: fmul v0.2s, v0.2s, v2.2s
; CHECK-GI-NEXT: fmul v1.2s, v1.2s, v3.2s
; CHECK-GI-NEXT: fmul v0.2s, v0.2s, v1.2s
; CHECK-GI-NEXT: mov s1, v0.s[1]
; CHECK-GI-NEXT: fmul s0, s0, s1
; CHECK-GI-NEXT: ret
%r = call fast float @llvm.vector.reduce.fmul.f32.v8f32(float 1.0, <8 x float> %bin.rdx)
ret float %r
}
```
https://github.com/llvm/llvm-project/pull/73309
More information about the llvm-commits
mailing list