[all-commits] [llvm/llvm-project] 5836ba: [AArch64] Change the cost of fma and fmuladd to ma...

Thu Aug 14 13:54:06 PDT 2025

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 5836bae463ce68e834e83231e443007d324ed89a
      https://github.com/llvm/llvm-project/commit/5836bae463ce68e834e83231e443007d324ed89a
  Author: David Green <david.green at arm.com>
  Date:   2025-08-14 (Thu, 14 Aug 2025)

  Changed paths:
    M llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/AArch64/arith-fp.ll
    M llvm/test/Analysis/CostModel/AArch64/sve-arith-fp.ll
    M llvm/test/Transforms/LoopVectorize/AArch64/f128-fmuladd-reduction.ll
    M llvm/test/Transforms/LoopVectorize/AArch64/veclib-intrinsic-calls.ll
    M llvm/test/Transforms/SLPVectorizer/AArch64/reused-scalar-repeated-in-node.ll
    M llvm/test/Transforms/SLPVectorizer/AArch64/vec3-reorder-reshuffle.ll
    M llvm/test/Transforms/SLPVectorizer/insertelement-postpone.ll

  Log Message:
  -----------
  [AArch64] Change the cost of fma and fmuladd to match fmul. (#152963)

As fmul and fmadd are so similar, their performance characteristics tend
to be the same on most platforms, at least in terms of reciprocal
throughputs. Processors capable of performing a given number of fmul per
cycle can usually perform the same number of fma, with the extra add
being relatively simple on top. This patch makes the scores of the two
operations the same, which brings the throughput cost of a fma/fmuladd
to 2, and the latency to 3, which are the defaults for fmul.

Note that we might also want to change the throughput cost of a fmul to
1, as most processors have ample bandwidth for them, but they should
still stay in-line with one another.

To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications