[PATCH] D154558: [AArch64][SVE] Add patterns to support sve indexed FMLA/FMLS

Paul Walker via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Jul 7 05:17:54 PDT 2023


paulwalker-arm added inline comments.


================
Comment at: llvm/test/CodeGen/AArch64/sve-fma.ll:7-12
+; CHECK-NEXT:    fmla z1.h, z0.h, z2.h[0]
+; CHECK-NEXT:    mov z0.d, z1.d
+; CHECK-NEXT:    ret
+  %b0splat = shufflevector <vscale x 8 x half> %b, <vscale x 8 x half> undef, <vscale x 8 x i32> zeroinitializer
+  %mad = call <vscale x 8 x half> @llvm.fma.nxv8f16(<vscale x 8 x half> %b0splat, <vscale x 8 x half> %a, <vscale x 8 x half> %c)
+  ret <vscale x 8 x half> %mad
----------------
Looking at https://developer.arm.com/documentation/ddi0602/2023-06/SVE-Instructions/FMLA--indexed---Floating-point-fused-multiply-add-by-indexed-elements--Zda---Zda---Zn---Zm-indexed--- I think you've misunderstood how the indexed instructions operate.

The index FMLA instruction does not multiple all elements of `Zn` by `Zm[0]` but rather is multiplies the elements within each 128-bit chunk of `Zn` by the element whose index applies to that same 128-bit chunk. Taking a 256-bit SVE implementation, an element type of f32 and an index of 1, the operation is:
```
Za[0] += Zn[0]*Zm[1];
Za[1] += Zn[1]*Zm[1];
Za[2] += Zn[2]*Zm[1];
Za[3] += Zn[3]*Zm[1];
Za[4] += Zn[4]*Zm[5];
Za[5] += Zn[5]*Zm[5];
Za[6] += Zn[6]*Zm[5];
Za[7] += Zn[7]*Zm[5];
```
Which means in order for these tests to be functionally the same after the transformation an explicit splat is required and thus they'd be little point in using the index instruction.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154558/new/

https://reviews.llvm.org/D154558



More information about the llvm-commits mailing list