[PATCH] D154558: [AArch64][SVE] Add patterns to support sve indexed FMLA/FMLS
Paul Walker via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Jul 7 05:17:54 PDT 2023
paulwalker-arm added inline comments.
================
Comment at: llvm/test/CodeGen/AArch64/sve-fma.ll:7-12
+; CHECK-NEXT: fmla z1.h, z0.h, z2.h[0]
+; CHECK-NEXT: mov z0.d, z1.d
+; CHECK-NEXT: ret
+ %b0splat = shufflevector <vscale x 8 x half> %b, <vscale x 8 x half> undef, <vscale x 8 x i32> zeroinitializer
+ %mad = call <vscale x 8 x half> @llvm.fma.nxv8f16(<vscale x 8 x half> %b0splat, <vscale x 8 x half> %a, <vscale x 8 x half> %c)
+ ret <vscale x 8 x half> %mad
----------------
Looking at https://developer.arm.com/documentation/ddi0602/2023-06/SVE-Instructions/FMLA--indexed---Floating-point-fused-multiply-add-by-indexed-elements--Zda---Zda---Zn---Zm-indexed--- I think you've misunderstood how the indexed instructions operate.
The index FMLA instruction does not multiple all elements of `Zn` by `Zm[0]` but rather is multiplies the elements within each 128-bit chunk of `Zn` by the element whose index applies to that same 128-bit chunk. Taking a 256-bit SVE implementation, an element type of f32 and an index of 1, the operation is:
```
Za[0] += Zn[0]*Zm[1];
Za[1] += Zn[1]*Zm[1];
Za[2] += Zn[2]*Zm[1];
Za[3] += Zn[3]*Zm[1];
Za[4] += Zn[4]*Zm[5];
Za[5] += Zn[5]*Zm[5];
Za[6] += Zn[6]*Zm[5];
Za[7] += Zn[7]*Zm[5];
```
Which means in order for these tests to be functionally the same after the transformation an explicit splat is required and thus they'd be little point in using the index instruction.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D154558/new/
https://reviews.llvm.org/D154558
More information about the llvm-commits
mailing list