[all-commits] [llvm/llvm-project] 38c92c: [AArch64] Add patterns for FMADD, FMSUB
OverMighty via All-commits
all-commits at lists.llvm.org
Wed Aug 30 03:39:23 PDT 2023
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: 38c92c1ee2f07e3260c94d51834a97e84f93c708
https://github.com/llvm/llvm-project/commit/38c92c1ee2f07e3260c94d51834a97e84f93c708
Author: OverMighty <its.overmighty at gmail.com>
Date: 2023-08-30 (Wed, 30 Aug 2023)
Changed paths:
M clang/test/CodeGen/aarch64-neon-scalar-x-indexed-elem-constrained.c
M llvm/lib/Target/AArch64/AArch64InstrFormats.td
M llvm/test/CodeGen/AArch64/complex-deinterleaving-f16-mul.ll
M llvm/test/CodeGen/AArch64/fp16_intrinsic_lane.ll
M llvm/test/CodeGen/AArch64/neon-scalar-by-elem-fma.ll
Log Message:
-----------
[AArch64] Add patterns for FMADD, FMSUB
FMADD, FMSUB instructions perform better or the same compared to indexed
FMLA, FMLS.
For example, the Arm Cortex-A55 Software Optimization Guide lists "FP
multiply accumulate" FMADD, FMSUB instructions with a throughput of 2
IPC, whereas it lists "ASIMD FP multiply accumulate, by element" FMLA,
FMLS with a throughput of 1 IPC.
The Arm Cortex-A77 Software Optimization Guide, however, does not
separately list "by element" variants of the "ASIMD FP multiply
accumulate" instructions, which are listed with the same throughput of 2
IPC as "FP multiply accumulate" instructions.
Reviewed By: samtebbs, dzhidzhoev
Differential Revision: https://reviews.llvm.org/D158008
More information about the All-commits
mailing list