[clang] [llvm] [Clang][AArch64] Implement widening FMMLA intrinsics (PR #165282)

Wed Oct 29 07:35:36 PDT 2025

================
@@ -0,0 +1,32 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve-f16f32mm < %s | FileCheck %s --check-prefixes=CHECK
+
+define <vscale x 4 x float> @_Z1tu13__SVFloat32_tu13__SVFloat16_tS0_(<vscale x 4 x float> %acc, <vscale x 8 x half> %a, <vscale x 8 x half> %b) {
+; CHECK-LABEL: _Z1tu13__SVFloat32_tu13__SVFloat16_tS0_:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    str x29, [sp, #-16]! // 8-byte Folded Spill
+; CHECK-NEXT:    addvl sp, sp, #-3
+; CHECK-NEXT:    .cfi_escape 0x0f, 0x08, 0x8f, 0x10, 0x92, 0x2e, 0x00, 0x48, 0x1e, 0x22 // sp + 16 + 24 * VG
+; CHECK-NEXT:    .cfi_offset w29, -16
+; CHECK-NEXT:    str z0, [sp, #2, mul vl]
+; CHECK-NEXT:    fmmla z0.s, z1.h, z2.h
+; CHECK-NEXT:    str z1, [sp, #1, mul vl]
+; CHECK-NEXT:    str z2, [sp]
+; CHECK-NEXT:    addvl sp, sp, #3
+; CHECK-NEXT:    ldr x29, [sp], #16 // 8-byte Folded Reload
+; CHECK-NEXT:    ret
+entry:
+  %acc.addr = alloca <vscale x 4 x float>, align 16
+  %a.addr = alloca <vscale x 8 x half>, align 16
+  %b.addr = alloca <vscale x 8 x half>, align 16
+  store <vscale x 4 x float> %acc, ptr %acc.addr, align 16
+  store <vscale x 8 x half> %a, ptr %a.addr, align 16
+  store <vscale x 8 x half> %b, ptr %b.addr, align 16
+  %0 = load <vscale x 4 x float>, ptr %acc.addr, align 16
+  %1 = load <vscale x 8 x half>, ptr %a.addr, align 16
+  %2 = load <vscale x 8 x half>, ptr %b.addr, align 16
+  %3 = call <vscale x 4 x float> @llvm.aarch64.sve.fmmla.f16f32(<vscale x 4 x float> %0, <vscale x 8 x half> %1, <vscale x 8 x half> %2)
----------------
Lukacma wrote:

There is too much unrelated code here. The backend tests should only do  intrinsic call.

https://github.com/llvm/llvm-project/pull/165282