[llvm] Swap UnrollAndJam Pass to before the SLP Vectorizer Pass (PR #97029)
via llvm-commits
llvm-commits at lists.llvm.org
Fri Jun 28 02:44:38 PDT 2024
https://github.com/adprasad-nvidia created https://github.com/llvm/llvm-project/pull/97029
This change moves the LoopUnrollAndJam pass before the SLPVectorizerPass.
This will, if LoopUnrollAndJam is enabled, enable some outer loop vectorization in LLVM. This will both improve runtimes and produce fewer lines of assembly for the same programs.
Note that if IsFullLTO is enabled, LoopUnrollAndJam already occurs before the SLPVectorizerPass, so only the code adding the pass when IsFullLTO is not enabled is moved to before the SLPVectorizerPass.
The change does not regress performance on TSVC-2, RAJAPerf, and Coremark.
>From b6e59756e13d999c15a68107c94f8f69b7b1a5df Mon Sep 17 00:00:00 2001
From: adprasad <adprasad at nvidia.com>
Date: Tue, 18 Jun 2024 13:36:01 +0530
Subject: [PATCH 1/2] [UnJ] Move LoopUnrollAndJamPass before SLPVectorizerPass
---
llvm/lib/Passes/PassBuilderPipelines.cpp | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 926515c9508a9..9fd36e92be981 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -1306,6 +1306,11 @@ void PassBuilder::addVectorPasses(OptimizationLevel Level,
FPM.addPass(BDCEPass());
}
+ // We do UnrollAndJam in a separate LPM to Unroll ensure it happens first.
+ if (EnableUnrollAndJam && PTO.LoopUnrolling) {
+ FPM.addPass(createFunctionToLoopPassAdaptor(
+ LoopUnrollAndJamPass(Level.getSpeedupLevel())));
+ }
// Optimize parallel scalar instruction chains into SIMD instructions.
if (PTO.SLPVectorization) {
FPM.addPass(SLPVectorizerPass());
@@ -1324,11 +1329,6 @@ void PassBuilder::addVectorPasses(OptimizationLevel Level,
// FIXME: It would be really good to use a loop-integrated instruction
// combiner for cleanup here so that the unrolling and LICM can be pipelined
// across the loop nests.
- // We do UnrollAndJam in a separate LPM to ensure it happens before unroll
- if (EnableUnrollAndJam && PTO.LoopUnrolling) {
- FPM.addPass(createFunctionToLoopPassAdaptor(
- LoopUnrollAndJamPass(Level.getSpeedupLevel())));
- }
FPM.addPass(LoopUnrollPass(LoopUnrollOptions(
Level.getSpeedupLevel(), /*OnlyWhenForced=*/!PTO.LoopUnrolling,
PTO.ForgetAllSCEVInLoopUnroll)));
>From be8ed07c7945eb02c3447ef77da9f9b3eb400460 Mon Sep 17 00:00:00 2001
From: adprasad <adprasad at nvidia.com>
Date: Tue, 25 Jun 2024 15:26:56 +0530
Subject: [PATCH 2/2] [UnJ] Add comments explaining new position of
UnrollAndJam
---
llvm/lib/Passes/PassBuilderPipelines.cpp | 2 ++
1 file changed, 2 insertions(+)
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 9fd36e92be981..014df0e17c6a9 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -1234,6 +1234,7 @@ void PassBuilder::addVectorPasses(OptimizationLevel Level,
// combiner for cleanup here so that the unrolling and LICM can be pipelined
// across the loop nests.
// We do UnrollAndJam in a separate LPM to ensure it happens before unroll
+ // In order for outer loop vectorization to be done, UnrollAndJam must occur before the SLPVectorizerPass.
if (EnableUnrollAndJam && PTO.LoopUnrolling)
FPM.addPass(createFunctionToLoopPassAdaptor(
LoopUnrollAndJamPass(Level.getSpeedupLevel())));
@@ -1307,6 +1308,7 @@ void PassBuilder::addVectorPasses(OptimizationLevel Level,
}
// We do UnrollAndJam in a separate LPM to Unroll ensure it happens first.
+ // In order for outer loop vectorization to be done, UnrollAndJam must occur before the SLPVectorizerPass.
if (EnableUnrollAndJam && PTO.LoopUnrolling) {
FPM.addPass(createFunctionToLoopPassAdaptor(
LoopUnrollAndJamPass(Level.getSpeedupLevel())));
More information about the llvm-commits
mailing list