[llvm] [Pipelines] Move IPSCCP after inliner pipeline (PR #96620)
via llvm-commits
llvm-commits at lists.llvm.org
Tue Jun 25 03:59:56 PDT 2024
https://github.com/sihuan created https://github.com/llvm/llvm-project/pull/96620
This patch significantly improves the performance of LLVM for the SPEC2017:548.exchange2_r benchmark, with a performance uplift of over 40% on the rv64gc.
During our investigation into the significant performance disparity between GCC and LLVM on the SPEC2017:548.exchange2_r benchmark on RISC-V, we identified that the primary difference stems from constant propagation optimization.
In GCC, the hotspot function `digits_2` is split into several parts:
```console
$ objdump -D exchange2_r_gcc | grep "digits_2.*:$"
000000000001d480 <__brute_force_MOD_digits_2.isra.0>:
000000000001f0f6 <__brute_force_MOD_digits_2.constprop.7.isra.0>:
000000000001fdd0 <__brute_force_MOD_digits_2.constprop.6.isra.0>:
0000000000020900 <__brute_force_MOD_digits_2.constprop.5.isra.0>:
00000000000211c4 <__brute_force_MOD_digits_2.constprop.4.isra.0>:
0000000000022002 <__brute_force_MOD_digits_2.constprop.3.isra.0>:
0000000000022d6a <__brute_force_MOD_digits_2.constprop.2.isra.0>:
0000000000023898 <__brute_force_MOD_digits_2.constprop.1.isra.0>:
```
However, in LLVM, this function is not split:
```console
$ objdump -D exchange2_r_llvm | grep "digits_2.*:$"
00000000000115a0 <_QMbrute_forcePdigits_2>:
```
By applying this patch, LLVM now exhibits similar behavior, resulting in a substantial performance uplift.
```console
$ objdump -D exchange2_r_patched_llvm | grep "digits_2.*:$"
0000000000011ab0 <_QMbrute_forcePdigits_2>:
0000000000018a4e <_QMbrute_forcePdigits_2.specialized.1>:
0000000000019820 <_QMbrute_forcePdigits_2.specialized.2>:
000000000001a436 <_QMbrute_forcePdigits_2.specialized.3>:
000000000001ae78 <_QMbrute_forcePdigits_2.specialized.4>:
000000000001ba8e <_QMbrute_forcePdigits_2.specialized.5>:
000000000001c7e6 <_QMbrute_forcePdigits_2.specialized.6>:
000000000001d072 <_QMbrute_forcePdigits_2.specialized.7>:
000000000001dad0 <_QMbrute_forcePdigits_2.specialized.8>:
```
And we used `perf stat` to measure the instruction count for `exchange2_r 0` on rv64gc, as shown in the table below:
| Compiler | Instructions |
|--------|--------|
| GCC #d28ea8e5 | 55,965,728,914 |
| LLVM #62d44fbd | 105,416,890,241 |
| LLVM #62d44fbd with this patch | 62,693,427,761 |
Additionally, I performed tests on x86_64, yielding similar results:
| Compiler | cpu_atom instructions |
|--------|--------|
| LLVM #62d44fbd | 100,147,914,793 |
| LLVM #62d44fbd with this patch | 53,077,337,115 |
>From abf211c35e39efc5d8f30019e10a14766985c185 Mon Sep 17 00:00:00 2001
From: SiHuaN <liyongtai at iscas.ac.cn>
Date: Tue, 25 Jun 2024 18:04:33 +0800
Subject: [PATCH] [Pipelines] Move IPSCCP after inliner pipeline
Moving the Interprocedural Constant Propagation (IPSCCP) pass to run after the
inliner pipeline can enhance optimization effectiveness. Performance uplift
for SPEC2017:548.exchange2_r on rv64gc is over 40%.
---
llvm/lib/Passes/PassBuilderPipelines.cpp | 28 ++++++++++++------------
1 file changed, 14 insertions(+), 14 deletions(-)
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 926515c9508a9..82e2690f4f441 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -1118,20 +1118,6 @@ PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,
invokePipelineEarlySimplificationEPCallbacks(MPM, Level);
- // Interprocedural constant propagation now that basic cleanup has occurred
- // and prior to optimizing globals.
- // FIXME: This position in the pipeline hasn't been carefully considered in
- // years, it should be re-analyzed.
- MPM.addPass(IPSCCPPass(
- IPSCCPOptions(/*AllowFuncSpec=*/
- Level != OptimizationLevel::Os &&
- Level != OptimizationLevel::Oz &&
- !isLTOPreLink(Phase))));
-
- // Attach metadata to indirect call sites indicating the set of functions
- // they may target at run-time. This should follow IPSCCP.
- MPM.addPass(CalledValuePropagationPass());
-
// Optimize globals to try and fold them into constants.
MPM.addPass(GlobalOptPass());
@@ -1204,6 +1190,20 @@ PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,
else
MPM.addPass(buildInlinerPipeline(Level, Phase));
+ // Interprocedural constant propagation after the inliner pipeline yields
+ // better optimization results.
+ // FIXME: This position in the pipeline hasn't been carefully considered in
+ // years, it should be re-analyzed.
+ MPM.addPass(IPSCCPPass(
+ IPSCCPOptions(/*AllowFuncSpec=*/
+ Level != OptimizationLevel::Os &&
+ Level != OptimizationLevel::Oz &&
+ !isLTOPreLink(Phase))));
+
+ // Attach metadata to indirect call sites indicating the set of functions
+ // they may target at run-time. This should follow IPSCCP.
+ MPM.addPass(CalledValuePropagationPass());
+
// Remove any dead arguments exposed by cleanups, constant folding globals,
// and argument promotion.
MPM.addPass(DeadArgumentEliminationPass());
More information about the llvm-commits
mailing list