[all-commits] [llvm/llvm-project] ea045b: [AArch64] Add patterns for scalar FMUL, FMULX
OverMighty via All-commits
all-commits at lists.llvm.org
Fri Jun 30 00:34:35 PDT 2023
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: ea045b99da8ee236076fddb256bdac98681441fa
https://github.com/llvm/llvm-project/commit/ea045b99da8ee236076fddb256bdac98681441fa
Author: OverMighty <its.overmighty at gmail.com>
Date: 2023-06-30 (Fri, 30 Jun 2023)
Changed paths:
M llvm/lib/Target/AArch64/AArch64InstrFormats.td
M llvm/lib/Target/AArch64/AArch64InstrInfo.td
M llvm/test/CodeGen/AArch64/arm64-fma-combines.ll
M llvm/test/CodeGen/AArch64/arm64-fml-combines.ll
M llvm/test/CodeGen/AArch64/arm64-neon-2velem.ll
M llvm/test/CodeGen/AArch64/arm64-neon-scalar-by-elem-mul.ll
M llvm/test/CodeGen/AArch64/complex-deinterleaving-f16-mul.ll
M llvm/test/CodeGen/AArch64/fp16_intrinsic_lane.ll
M llvm/test/CodeGen/AArch64/vecreduce-fmul-legalization-strict.ll
Log Message:
-----------
[AArch64] Add patterns for scalar FMUL, FMULX
Scalar FMUL, FMULX instructions perform better or the same compared to indexed
FMUL, FMULX.
For example, the Arm Cortex-A55 Software Optimization Guide lists the following
instructions with a throughput of 2 IPC:
- "FP multiply" FMUL
- "ASIMD FP multiply" FMULX
whereas it lists the following with a throughput of 1 IPC:
- "ASIMD FP multiply, by element" FMUL, FMULX
The Arm Cortex-A510 Software Optimization Guide, however, does not separately
list "by element" variants of the "ASIMD FP multiply" instructions, which are
listed with the same throughput as the non-ASIMD ones.
Fixes #60817.
Differential Revision: https://reviews.llvm.org/D153207
More information about the All-commits
mailing list