[PATCH] D154558: [AArch64][SVE] Add patterns to support sve indexed FMLA/FMLS

Fri Jul 7 05:17:54 PDT 2023

paulwalker-arm added inline comments.

================
Comment at: llvm/test/CodeGen/AArch64/sve-fma.ll:7-12
+; CHECK-NEXT:    fmla z1.h, z0.h, z2.h[0]
+; CHECK-NEXT:    mov z0.d, z1.d
+; CHECK-NEXT:    ret
+  %b0splat = shufflevector <vscale x 8 x half> %b, <vscale x 8 x half> undef, <vscale x 8 x i32> zeroinitializer
+  %mad = call <vscale x 8 x half> @llvm.fma.nxv8f16(<vscale x 8 x half> %b0splat, <vscale x 8 x half> %a, <vscale x 8 x half> %c)
+  ret <vscale x 8 x half> %mad
----------------
Looking at https://developer.arm.com/documentation/ddi0602/2023-06/SVE-Instructions/FMLA--indexed---Floating-point-fused-multiply-add-by-indexed-elements--Zda---Zda---Zn---Zm-indexed--- I think you've misunderstood how the indexed instructions operate.

The index FMLA instruction does not multiple all elements of `Zn` by `Zm[0]` but rather is multiplies the elements within each 128-bit chunk of `Zn` by the element whose index applies to that same 128-bit chunk. Taking a 256-bit SVE implementation, an element type of f32 and an index of 1, the operation is:
```
Za[0] += Zn[0]*Zm[1];
Za[1] += Zn[1]*Zm[1];
Za[2] += Zn[2]*Zm[1];
Za[3] += Zn[3]*Zm[1];
Za[4] += Zn[4]*Zm[5];
Za[5] += Zn[5]*Zm[5];
Za[6] += Zn[6]*Zm[5];
Za[7] += Zn[7]*Zm[5];
```
Which means in order for these tests to be functionally the same after the transformation an explicit splat is required and thus they'd be little point in using the index instruction.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154558/new/

https://reviews.llvm.org/D154558