[llvm] [SystemZ] Remove getInliningThresholdMultiplier() override (PR #94612)

Tue Jul 23 03:46:46 PDT 2024

JonPsson1 wrote:

I have now looked into this, building SPEC with both x3 (main), x2 and x1 (no) multipliers. The code size changes are listed below, relative to main. It seems that changing to x2 is indeed somewhat helpful. Looking at the benchmarks though (below), it is clear that this still looses performance in some cases.

```
Code size
Multiplier vs x3 (main)    x2        x1

i500.perlbench_r        -7.4%    -14.6%
i505.mcf_r              +0.0%     +0.0%
i523.xalancbmk_r        -6.6%    -11.0%
i531.deepsjeng_r        +0.1%     -2.8%
i502.gcc_r              -5.4%    -13.6%
i520.omnetpp_r          -4.0%     -9.6%
i525.x264_r             -3.8%    -13.3%
i541.leela_r            -1.0%     -9.1%
i557.xz_r               -3.1%     -6.2%
f507.cactuBSSN_r        -1.4%     -3.3%
f508.namd_r            -21.6%    -25.0%
f510.parest_r           -3.4%    -10.3%
f511.povray_r          -12.3%    -16.0%
f519.lbm_r              +0.0%     +0.0%
f521.wrf_r              -0.1%     -0.2%
f526.blender_r          -2.5%     -6.5%
f527.cam4_r             -0.0%     -0.1%
f538.imagick_r          -1.8%     -5.0%
f544.nab_r              -6.1%    -13.4%

Runtime vs main            x2        x1

f538.imagick_r           +13%      +15%
i500.perlbench_r          +2%       +6%
f511.povray_r             -1%       +3%
...

```

Trying the reproducer :

/usr/bin/time -v ./bin/opt -O3 -mtriple=s390x-linux-gnu -mcpu=z16 ./bad-42752cbe095.bc -o /dev/null

x1
        User time (seconds): 4.81
        Maximum resident set size (kbytes): 132960
        Average resident set size (kbytes): 0

x2
        User time (seconds): 9.44
        Maximum resident set size (kbytes): 175012
        Average resident set size (kbytes): 0

x3
        Bad: Keeps on going after 45 minutes, having slowly increased memory usage up to 30G and still going.

So I can confirm getting the same results as reported - x2 would remedy this case.

Looking at imagick, it is clear that in some cases increased inlining can have very good performance improvement while not affecting code size as much. And e.g. namd shows that inlining can increase code size significantly without giving much of an improvement...

https://github.com/llvm/llvm-project/pull/94612