[llvm] [SLP]Represent externally used values as original scalars, if profitable. (PR #100904)

Wed Nov 6 09:34:23 PST 2024

dgg5503 wrote:

Hi @alexey-bataev,

Under the latest commit of LLVM 19.1.3 (`test` column - ab51eccf88f5321e7c60591c5546b254b6afab99), there is an observable ~8% drop in performance on my x86_64 6c/12t Ubuntu machine for the SPEC CPU 2017 tests `CFP2017speed/638.imagick_s/638.imagick_s.test` and `CFP2017rate/538.imagick_r/538.imagick_r.test` compared to LLVM 18.1.8 (`baseline` column - 3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff):
```
Tests: 11
Metric: exec_time

Program                                                                   exec_time             
                                                                          baseline  test   diff 
             test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test   0.03      0.03 27.0%
             test-suite :: MultiSource/Applications/JM/lencod/lencod.test   2.82      2.80 -0.9%
                  test-suite :: MultiSource/Benchmarks/Bullet/bullet.test   2.28      2.37  4.1%
test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test   2.24      2.29  2.0%
          test-suite :: MultiSource/Benchmarks/FreeBench/pifft/pifft.test   0.05      0.05 -0.4%
                  test-suite :: MultiSource/Benchmarks/nbench/nbench.test   0.88      0.86 -2.7%
            test-suite :: SPEC/CFP2017rate/510.parest_r/510.parest_r.test  26.08     26.73  2.5%
            test-suite :: SPEC/CFP2017rate/511.povray_r/511.povray_r.test   3.09      3.15  1.8%
          test-suite :: SPEC/CFP2017rate/526.blender_r/526.blender_r.test  96.17     96.14 -0.0%
          test-suite :: SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  20.92     22.76  8.8%
         test-suite :: SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  20.98     22.75  8.5%
                                                       Geomean difference                   4.3%
       exec_time                      
l/r     baseline       test       diff
count  11.000000  11.000000  11.000000
mean   15.958291  16.356445  0.046005 
std    28.342679  28.424672  0.082604 
min    0.026300   0.033400  -0.026621 
25%    1.561100   1.571350  -0.002055 
50%    2.821600   2.796500   0.019570 
75%    20.950250  22.752750  0.062887 
max    96.165100  96.142500  0.269962 
```

I bisected this performance regression to the introduction of commit 32994cc0d63513f77223c64148faeeb50aebb702, however, from local testing, the regression was later fixed by this pull request.

Would it be possible to please backport this pull request (b10ecfa914dd1bc2013584917d0505ba5f15f75c) and dependent changes to LLVM 19 on the grounds of improved performance? I am requesting your help since I am personally not familiar with the SLP Vectorizer code.

In an attempt to provide motivation, I identified the following series of commits which allow this pull request to be cleanly picked into LLVM 19 at ab51eccf88f5321e7c60591c5546b254b6afab99:

* 5fc9502f19a87f7b1194cf5eadccf5f918bc50ca – NFC
* 1e1c8d16153a8b3f53b6a8797a77112ecc289551
* 197f4a90519df308d9bfddcc931f7683a5ae9cb9
* 6b1d13761ac0c9857763e5f4c0ae554f076dd9b7 – Fixed https://github.com/llvm/llvm-project/issues/101213 which is present in LLVM 19 (see https://godbolt.org/z/4Wc5Yj3eh)
* b5a7d3b6c2169d84f9da749425a38dcef914d1ce
* deb3ecf09fa30391bd22f890d2885c3d4816ca15 – NFC
* 799fd3d87bb15c37027c9c4451ab8c4dac3ca437
* 834ad102c377a4d1cdc6c601d9899b5dc0a1858b – NFC
* daf4a06e5c5531005b275b72681e04bd08e58fe4
* 441f94f4bdf6e2b9747ec12194c595098d9c3a5b
* 97743b8be86ab96afb26ba93e1876406c1f4d541
* 32c69faa6ce58333c26293a7708fa3f71991c55c
* b10ecfa914dd1bc2013584917d0505ba5f15f75c

Using the above set of commits, all LIT tests pass. Here are some results which demonstrate a measurable improvement back to LLVM 18 levels under LLVM 19 for the two `imagick` tests highlighted at least on x86_64:
```
Tests: 11
Metric: exec_time

Program                                                                   exec_time              
                                                                          baseline  test   diff  
             test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test   0.03      0.02 -25.5%
             test-suite :: MultiSource/Applications/JM/lencod/lencod.test   2.82      2.79  -1.1%
                  test-suite :: MultiSource/Benchmarks/Bullet/bullet.test   2.28      2.26  -0.7%
test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test   2.24      2.22  -1.0%
          test-suite :: MultiSource/Benchmarks/FreeBench/pifft/pifft.test   0.05      0.05  -1.6%
                  test-suite :: MultiSource/Benchmarks/nbench/nbench.test   0.88      0.85  -3.3%
            test-suite :: SPEC/CFP2017rate/510.parest_r/510.parest_r.test  26.08     26.07  -0.0%
            test-suite :: SPEC/CFP2017rate/511.povray_r/511.povray_r.test   3.09      3.09  -0.2%
          test-suite :: SPEC/CFP2017rate/526.blender_r/526.blender_r.test  96.17     95.06  -1.1%
          test-suite :: SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  20.92     20.55  -1.8%
         test-suite :: SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  20.98     20.58  -1.9%
                                                       Geomean difference                   -3.8%
       exec_time                      
l/r     baseline       test       diff
count  11.000000  11.000000  11.000000
mean   15.958291  15.777064 -0.034704 
std    28.342679  28.022843  0.073510 
min    0.026300   0.019600  -0.254753 
25%    1.561100   1.535950  -0.018355 
50%    2.821600   2.790200  -0.011454 
75%    20.950250  20.565700 -0.008460 
max    96.165100  95.063600 -0.000391 
```
* `test` column - LLVM 19.1.3 @ ab51eccf88f5321e7c60591c5546b254b6afab99 + all commits mentioned above
* `baseline` column - LLVM 18.1.8 @ 3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff

If any additional information or testing is desired, please let me know. I'd be happy to provide.

Thank you,
Douglas Gliner

https://github.com/llvm/llvm-project/pull/100904