[all-commits] [llvm/llvm-project] ea045b: [AArch64] Add patterns for scalar FMUL, FMULX

Fri Jun 30 00:34:35 PDT 2023

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: ea045b99da8ee236076fddb256bdac98681441fa
      https://github.com/llvm/llvm-project/commit/ea045b99da8ee236076fddb256bdac98681441fa
  Author: OverMighty <its.overmighty at gmail.com>
  Date:   2023-06-30 (Fri, 30 Jun 2023)

  Changed paths:
    M llvm/lib/Target/AArch64/AArch64InstrFormats.td
    M llvm/lib/Target/AArch64/AArch64InstrInfo.td
    M llvm/test/CodeGen/AArch64/arm64-fma-combines.ll
    M llvm/test/CodeGen/AArch64/arm64-fml-combines.ll
    M llvm/test/CodeGen/AArch64/arm64-neon-2velem.ll
    M llvm/test/CodeGen/AArch64/arm64-neon-scalar-by-elem-mul.ll
    M llvm/test/CodeGen/AArch64/complex-deinterleaving-f16-mul.ll
    M llvm/test/CodeGen/AArch64/fp16_intrinsic_lane.ll
    M llvm/test/CodeGen/AArch64/vecreduce-fmul-legalization-strict.ll

  Log Message:
  -----------
  [AArch64] Add patterns for scalar FMUL, FMULX

Scalar FMUL, FMULX instructions perform better or the same compared to indexed
FMUL, FMULX.

For example, the Arm Cortex-A55 Software Optimization Guide lists the following
instructions with a throughput of 2 IPC:
 - "FP multiply" FMUL
 - "ASIMD FP multiply" FMULX

whereas it lists the following with a throughput of 1 IPC:
 - "ASIMD FP multiply, by element" FMUL, FMULX

The Arm Cortex-A510 Software Optimization Guide, however, does not separately
list "by element" variants of the "ASIMD FP multiply" instructions, which are
listed with the same throughput as the non-ASIMD ones.

Fixes #60817.

Differential Revision: https://reviews.llvm.org/D153207