[all-commits] [llvm/llvm-project] bec9ac: [llvm] Lower latency bonus threshold in function s...

Tue Jun 17 16:14:04 PDT 2025

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: bec9ac2dafe1c9fca975721e9951c5f7f6b1b559
      https://github.com/llvm/llvm-project/commit/bec9ac2dafe1c9fca975721e9951c5f7f6b1b559
  Author: Slava Zakharin <szakharin at nvidia.com>
  Date:   2025-06-17 (Tue, 17 Jun 2025)

  Changed paths:
    M llvm/lib/Transforms/IPO/FunctionSpecialization.cpp

  Log Message:
  -----------
  [llvm] Lower latency bonus threshold in function specialization. (#143954)

Related to #143219.

Function specialization does not kick in if flang sets `noalias`
attributes on the function arguments of `digits_2`, because PRE
optimizes several `srem` instructions and other memory accesses
from the inner loops causing the latency bonus to be lower than
the current 40% threshold.

While looking at this, I did not really get why we compute the latency
bonus as a ratio of the latency of the "eliminated" instructions
and the code-size of the whole function. It did not make much sense
to me.

I tried computing the total latency as a sum of latencies
of the instructions that belong to non-dead code (including
the instructions that would be executed had they not been
"eliminated" due to the constant propagation). This total
latency should identify the total cost of executing the function
with the given argument being dynamically equal to the tried
constant value. Then the latency bonus would be computed
as the ratio between the latency of the "eliminated" instructions
and the total latency. Unfortunately, this did not given me a good
heuristics either. The bonus was close to 0% on some targets,
and as big as 3-5% on other targets. This does match very well
with the performance gain achieved by function specialization
for exchange2, so it seemd like another artificial heuristic
not better than the current one.

It seems that GCC uses a set of different heuristics for function
specialization, but I am not an expert here and I cannot say
if we can match them in LLVM.

With all that said, I decided to try to lower the threshold
to avoid the regression and be able to re-enable the generally
good change for `noalias` attribute.

With this patch, I was able to reduce the effect of `noalias`,
so that `-force-no-alias=true` is only ~10% slower than
`-force-no-alias=false` code on neoverse-v1 and neoverse-v2.
On neoverse-n1, `-force-no-alias=true` is >2x faster than
`-force-no-alias=false` regardless of this patch.

This threshold has been changed before also due to improved
alias information:
https://github.com/llvm/llvm-project/commit/2fb51fba8ca904a6d3ddf30ae94228ecf9e6a231#diff-066363256b7b4164e66b28a3028b2cb9e405c9136241baa33db76ebd2edb87cd

Please let me know what testing I should run to make sure this change
is safe. As I understand, it may affect the compilation time
performance,
and I will appreciate it if someone points out which benchmarks
need to be checked before merging this.

To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications