[llvm] [Pipelines] Move IPSCCP after inliner pipeline (PR #96620)

Tue Jun 25 03:59:56 PDT 2024

https://github.com/sihuan created https://github.com/llvm/llvm-project/pull/96620

This patch significantly improves the performance of LLVM for the SPEC2017:548.exchange2_r benchmark, with a performance uplift of over 40% on the rv64gc.

During our investigation into the significant performance disparity between GCC and LLVM on the SPEC2017:548.exchange2_r benchmark on RISC-V, we identified that the primary difference stems from constant propagation optimization. 
In GCC, the hotspot function `digits_2` is split into several parts:
```console
$ objdump -D exchange2_r_gcc | grep "digits_2.*:$"
000000000001d480 <__brute_force_MOD_digits_2.isra.0>:
000000000001f0f6 <__brute_force_MOD_digits_2.constprop.7.isra.0>:
000000000001fdd0 <__brute_force_MOD_digits_2.constprop.6.isra.0>:
0000000000020900 <__brute_force_MOD_digits_2.constprop.5.isra.0>:
00000000000211c4 <__brute_force_MOD_digits_2.constprop.4.isra.0>:
0000000000022002 <__brute_force_MOD_digits_2.constprop.3.isra.0>:
0000000000022d6a <__brute_force_MOD_digits_2.constprop.2.isra.0>:
0000000000023898 <__brute_force_MOD_digits_2.constprop.1.isra.0>:
```
However, in LLVM, this function is not split:
```console
$ objdump -D exchange2_r_llvm | grep "digits_2.*:$"
00000000000115a0 <_QMbrute_forcePdigits_2>:
```
By applying this patch, LLVM now exhibits similar behavior, resulting in a substantial performance uplift.
```console
$ objdump -D exchange2_r_patched_llvm | grep "digits_2.*:$"
0000000000011ab0 <_QMbrute_forcePdigits_2>:
0000000000018a4e <_QMbrute_forcePdigits_2.specialized.1>:
0000000000019820 <_QMbrute_forcePdigits_2.specialized.2>:
000000000001a436 <_QMbrute_forcePdigits_2.specialized.3>:
000000000001ae78 <_QMbrute_forcePdigits_2.specialized.4>:
000000000001ba8e <_QMbrute_forcePdigits_2.specialized.5>:
000000000001c7e6 <_QMbrute_forcePdigits_2.specialized.6>:
000000000001d072 <_QMbrute_forcePdigits_2.specialized.7>:
000000000001dad0 <_QMbrute_forcePdigits_2.specialized.8>:
```
And we used `perf stat` to measure the instruction count for `exchange2_r 0` on rv64gc, as shown in the table below:
| Compiler | Instructions |
|--------|--------|
| GCC  #d28ea8e5 | 55,965,728,914 |
| LLVM #62d44fbd | 105,416,890,241  |
| LLVM #62d44fbd with this patch | 62,693,427,761 |

 Additionally, I performed tests on x86_64, yielding similar results:
| Compiler | cpu_atom instructions |
|--------|--------|
| LLVM #62d44fbd | 100,147,914,793   |
| LLVM #62d44fbd with this patch | 53,077,337,115 | 



>From abf211c35e39efc5d8f30019e10a14766985c185 Mon Sep 17 00:00:00 2001
From: SiHuaN <liyongtai at iscas.ac.cn>
Date: Tue, 25 Jun 2024 18:04:33 +0800
Subject: [PATCH] [Pipelines] Move IPSCCP after inliner pipeline

Moving the Interprocedural Constant Propagation (IPSCCP) pass to run after the
inliner pipeline can enhance optimization effectiveness. Performance uplift
for SPEC2017:548.exchange2_r on rv64gc is over 40%.
---
 llvm/lib/Passes/PassBuilderPipelines.cpp | 28 ++++++++++++------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 926515c9508a9..82e2690f4f441 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -1118,20 +1118,6 @@ PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,
 
   invokePipelineEarlySimplificationEPCallbacks(MPM, Level);
 
-  // Interprocedural constant propagation now that basic cleanup has occurred
-  // and prior to optimizing globals.
-  // FIXME: This position in the pipeline hasn't been carefully considered in
-  // years, it should be re-analyzed.
-  MPM.addPass(IPSCCPPass(
-              IPSCCPOptions(/*AllowFuncSpec=*/
-                            Level != OptimizationLevel::Os &&
-                            Level != OptimizationLevel::Oz &&
-                            !isLTOPreLink(Phase))));
-
-  // Attach metadata to indirect call sites indicating the set of functions
-  // they may target at run-time. This should follow IPSCCP.
-  MPM.addPass(CalledValuePropagationPass());
-
   // Optimize globals to try and fold them into constants.
   MPM.addPass(GlobalOptPass());
 
@@ -1204,6 +1190,20 @@ PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,
   else
     MPM.addPass(buildInlinerPipeline(Level, Phase));
 
+  // Interprocedural constant propagation after the inliner pipeline yields
+  // better optimization results.
+  // FIXME: This position in the pipeline hasn't been carefully considered in
+  // years, it should be re-analyzed.
+  MPM.addPass(IPSCCPPass(
+              IPSCCPOptions(/*AllowFuncSpec=*/
+                            Level != OptimizationLevel::Os &&
+                            Level != OptimizationLevel::Oz &&
+                            !isLTOPreLink(Phase))));
+
+  // Attach metadata to indirect call sites indicating the set of functions
+  // they may target at run-time. This should follow IPSCCP.
+  MPM.addPass(CalledValuePropagationPass());
+
   // Remove any dead arguments exposed by cleanups, constant folding globals,
   // and argument promotion.
   MPM.addPass(DeadArgumentEliminationPass());