[flang-commits] [flang] [llvm] [Pipeline] Add IPSCCPPass at O3 to fix function specialization phase ordering (PR #201543)

Thu Jun 4 03:26:45 PDT 2026

https://github.com/anoopkg6 created https://github.com/llvm/llvm-project/pull/201543

      
Add a second IPSCCPPass with function specialization enabled at -O3 directly after the inliner pipeline during non-LTO phases. This addresses a target-independent phase ordering issue stemming from Fortran's pass-by-reference semantics.

Introducing IPSCCPPass after buildInlinerPipeline allows LoopRotate to rewrite the control flow topology from a while-loop to a do-while layout. This structural inversion duplicates the first execution of the loop body and pulls it outside the cyclic boundaries into a single-predecessor entry block, splitting the critical edge. Memory disambiguation then resolves in straight-line code, allowing the late IPSCCP run to successfully evaluate constant initializations that were previously blocked.

Cross-compilation check using --target=x86_64-unknown-linux-gnu confirms that this phase ordering issue is target-independent. Function specialization fails to trigger without this pipeline modification.

Performance Impact on SystemZ (548.exchange2_r):
- Patched (Before addVectorPasses): ~56% performance improvement.
- Delayed Placement (After addVectorPasses): Only ~16% improvement.

Compile-Time & Run-Time Impact on SystemZ (SPEC CPU 2017):
- Total Suite Runtime: ~1.75% geometric mean improvement.
- Total Suite Compile-Time: ~0% impact (-0.09% geometric mean delta).

Placing the pass early delivers the full optimization gain, whereas placing it after vectorization drops the gain significantly, proving that earlier execution is critical to avoid degrading downstream vectorization.

>From 350b901acee8aca3253eb098b874c080b76db898 Mon Sep 17 00:00:00 2001
From: "anoop.kumar6 at ibm.com" <anoopk at b35lp63.lnxne.boe>
Date: Thu, 4 Jun 2026 00:17:25 +0200
Subject: [PATCH] [Pipeline] Add IPSCCPPass at O3 to fix function
 specialization            Phase ordering

Add a second IPSCCPPass with function specialization enabled at -O3
directly after the inliner pipeline during non-LTO phases. This
addresses a target-independent phase ordering issue stemming from
Fortran's pass-by-reference semantics.

Introducing IPSCCPPass after buildInlinerPipeline allows LoopRotate to
rewrite the control flow topology from a while-loop to a do-while
layout. This structural inversion duplicates the first execution of
the loop body and pulls it outside the cyclic boundaries into a
single-predecessor entry block, splitting the critical edge. Memory
disambiguation then resolves in straight-line code, allowing the late
IPSCCP run to successfully evaluate constant initializations that were
previously blocked.

Cross-compilation check using --target=x86_64-unknown-linux-gnu
confirms that this phase ordering issue is target-independent. Function
specialization fails to trigger without this pipeline modification.

Performance Impact on SystemZ (548.exchange2_r):
- Patched (Before addVectorPasses): ~56% performance improvement.
- Delayed Placement (After addVectorPasses): Only ~16% improvement.

Compile-Time & Run-Time Impact on SystemZ (SPEC CPU 2017):
- Total Suite Runtime: ~1.75% geometric mean improvement.
- Total Suite Compile-Time: ~0% impact (-0.09% geometric mean delta).

Placing the pass early delivers the full optimization gain, whereas
placing it after vectorization drops the gain significantly, proving
that earlier execution is critical to avoid degrading downstream
vectorization.
---
 ...spec-phase-ordering-loop-rotate-ipsccp.f90 | 50 +++++++++++++++++++
 llvm/lib/Passes/PassBuilderPipelines.cpp      |  6 +++
 llvm/test/Other/new-pm-defaults.ll            |  5 +-
 ...ction-specialization-loop-rotate-ipsccp.ll | 48 ++++++++++++++++++
 .../dce-after-argument-promotion.ll           |  7 ++-
 5 files changed, 111 insertions(+), 5 deletions(-)
 create mode 100644 flang/test/Driver/funcspec-phase-ordering-loop-rotate-ipsccp.f90
 create mode 100644 llvm/test/Transforms/FunctionSpecialization/function-specialization-loop-rotate-ipsccp.ll

diff --git a/flang/test/Driver/funcspec-phase-ordering-loop-rotate-ipsccp.f90 b/flang/test/Driver/funcspec-phase-ordering-loop-rotate-ipsccp.f90
new file mode 100644
index 0000000000000..b18553b3bfd4b
--- /dev/null
+++ b/flang/test/Driver/funcspec-phase-ordering-loop-rotate-ipsccp.f90
@@ -0,0 +1,50 @@
+! RUN: %flang -O2 -mllvm -force-specialization -Xflang -fdebug-pass-manager -S -emit-llvm %s -o %t.O2.ll 2>%t.O2.stderr
+! RUN: FileCheck %s --check-prefix=PASS-O2-PIPE < %t.O2.stderr
+! RUN: FileCheck %s --check-prefix=PASS-O2-IR < %t.O2.ll
+!
+! RUN: %flang -O3 -mllvm -force-specialization -Xflang -fdebug-pass-manager -S -emit-llvm %s -o %t.O3.ll 2>%t.O3.stderr
+! RUN: FileCheck %s --check-prefix=PASS-O3-PIPE < %t.O3.stderr
+! RUN: FileCheck %s --check-prefix=PASS-O3-IR < %t.O3.ll
+
+module brute_force
+  implicit none
+  integer, public :: fallback_sink
+  integer, volatile, public :: global_fence = 1
+contains
+  subroutine top_level_caller()
+    integer :: temp
+    temp = 2
+    call digits_2(temp)
+  end subroutine top_level_caller
+
+  recursive subroutine digits_2(arg1)
+    integer, intent(in) :: arg1
+    integer :: temp_inner
+
+    if (global_fence == 1) then
+      if (arg1 == 2) then
+        temp_inner = arg1
+        call digits_2(temp_inner)
+      else
+        fallback_sink = arg1 * 5 + 12
+      end if
+    end if
+  end subroutine digits_2
+end module brute_force
+
+! PASS-O2-PIPE:      Running pass: IPSCCPPass on [module]
+! PASS-O2-PIPE:      Running pass: InlinerPass on (
+! PASS-O2-PIPE-NOT:  Running pass: IPSCCPPass on [module]
+! PASS-O2-PIPE:      Running pass: DeadArgumentEliminationPass on [module]
+
+! PASS-O2-IR-NOT:    .specialized.
+
+! PASS-O3-PIPE:      Running pass: IPSCCPPass on [module]
+! PASS-O3-PIPE:      Running pass: InlinerPass on (
+! PASS-O3-PIPE:      Running pass: IPSCCPPass on [module]
+! PASS-O3-PIPE:      Running pass: DeadArgumentEliminationPass on [module]
+
+! PASS-O3-IR:        define void @{{.*}}top_level_caller
+! PASS-O3-IR:        call fastcc void @{{.*}}digits_2{{.*}}.specialized.{{.*}}()
+! PASS-O3-IR:        define internal fastcc void @{{.*}}digits_2{{.*}}.specialized.1()
+! PASS-O3-IR:        define internal fastcc void @{{.*}}digits_2{{.*}}.specialized.2()
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 9eea552fd263e..2b0693e0d3248 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -1323,6 +1323,12 @@ PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,
   else
     MPM.addPass(buildInlinerPipeline(Level, Phase));
 
+  // Run a second IPSCCP pass at O3 to resolve function specialization
+  // opportunities exposed by loop rotation reference parameters
+  // (e.g., Fortran pass-by-reference).
+  if (Level.getSpeedupLevel() >= 3 && Phase == ThinOrFullLTOPhase::None)
+    MPM.addPass(llvm::IPSCCPPass(IPSCCPOptions(/*AllowFuncSpec=*/true)));
+
   // Remove any dead arguments exposed by cleanups, constant folding globals,
   // and argument promotion.
   MPM.addPass(DeadArgumentEliminationPass());
diff --git a/llvm/test/Other/new-pm-defaults.ll b/llvm/test/Other/new-pm-defaults.ll
index d6e51f451c3a4..ce470d26ab234 100644
--- a/llvm/test/Other/new-pm-defaults.ll
+++ b/llvm/test/Other/new-pm-defaults.ll
@@ -220,7 +220,10 @@
 ; CHECK-O-NEXT: Running pass: InvalidateAnalysisPass<{{.*}}ShouldNotRunFunctionPassesAnalysis
 ; CHECK-O-NEXT: Invalidating analysis: ShouldNotRunFunctionPassesAnalysis
 ; CHECK-O-NEXT: Invalidating analysis: InlineAdvisorAnalysis
-; CHECK-O-NEXT: Running pass: DeadArgumentEliminationPass
+; CHECK-O1-NEXT: Running pass: DeadArgumentEliminationPass
+; CHECK-O2-NEXT: Running pass: DeadArgumentEliminationPass
+; CHECK-O3-NOEXT: Running pass: IPSCCPPass
+; CHECK-O3: Running pass: DeadArgumentEliminationPass
 ; CHECK-O-NEXT: Running pass: CoroCleanupPass
 ; CHECK-O-NEXT: Running pass: GlobalOptPass
 ; CHECK-O-NEXT: Running pass: GlobalDCEPass
diff --git a/llvm/test/Transforms/FunctionSpecialization/function-specialization-loop-rotate-ipsccp.ll b/llvm/test/Transforms/FunctionSpecialization/function-specialization-loop-rotate-ipsccp.ll
new file mode 100644
index 0000000000000..d80c69344b8c0
--- /dev/null
+++ b/llvm/test/Transforms/FunctionSpecialization/function-specialization-loop-rotate-ipsccp.ll
@@ -0,0 +1,48 @@
+; RUN: opt -S -passes='ipsccp,function(loop(loop-rotate))' -force-specialization < %s | FileCheck %s --check-prefix=NO-LATE-IPSCCP
+; RUN: opt -S -passes='ipsccp,function(loop(loop-rotate)),ipsccp' -force-specialization < %s | FileCheck %s --check-prefix=WITH-LATE-IPSCCP
+
+ at external_cond = external global i1, align 1
+
+define void @top_level_caller() {
+; WITH-LATE-IPSCCP-LABEL: define void @top_level_caller(
+; WITH-LATE-IPSCCP:        call void @digits_2.specialized.{{[0-9]+}}(
+
+; NO-LATE-IPSCCP-LABEL:   define void @top_level_caller(
+; NO-LATE-IPSCCP-NOT:       @digits_2.specialized
+entry:
+  %temp = alloca i32, align 4
+  store i32 2, ptr %temp, align 4
+  %cond = load i1, ptr @external_cond, align 1
+  %idx = select i1 %cond, i32 0, i32 99
+  call void @digits_2(ptr %temp, i32 %idx)
+  ret void
+}
+
+define internal void @digits_2(ptr %arg1, i32 %loop_idx) {
+; NO-LATE-IPSCCP-LABEL: define internal void @digits_2(
+; NO-LATE-IPSCCP:        %val1 = load i32, ptr %arg1
+; NO-LATE-IPSCCP:        br label %loop.body
+
+; WITH-LATE-IPSCCP-LABEL: define internal void @digits_2.specialized.{{[0-9]+}}(
+entry:
+  %temp_alloc = alloca i32, align 4
+  br label %loop.header
+
+loop.header:
+  %ptr_input = phi ptr [ %arg1, %entry ], [ %arrayidx, %loop.body ]
+  %iv = phi i32 [ %loop_idx, %entry ], [ %iv.next, %loop.body ]
+  %val = load i32, ptr %ptr_input, align 4
+  %cmp = icmp slt i32 %iv, 1
+  br i1 %cmp, label %loop.body, label %exit
+
+loop.body:
+  %idxprom = sext i32 %iv to i64
+  %arrayidx = getelementptr inbounds i32, ptr %ptr_input, i64 %idxprom
+  store i32 %val, ptr %arrayidx, align 4
+  %iv.next = add nsw i32 %iv, 1
+  call void @digits_2(ptr %temp_alloc, i32 %iv.next)
+  br label %loop.header
+
+exit:
+  ret void
+}
diff --git a/llvm/test/Transforms/PhaseOrdering/dce-after-argument-promotion.ll b/llvm/test/Transforms/PhaseOrdering/dce-after-argument-promotion.ll
index c33fcfbe6ed97..435112fccb744 100644
--- a/llvm/test/Transforms/PhaseOrdering/dce-after-argument-promotion.ll
+++ b/llvm/test/Transforms/PhaseOrdering/dce-after-argument-promotion.ll
@@ -9,10 +9,9 @@
 
 define internal void @f(ptr byval(%struct.ss) align 8 %b, ptr byval(i32) align 4 %X) noinline nounwind  {
 ; CHECK-LABEL: define {{[^@]+}}@f
-; CHECK-SAME: (i32 [[B_0:%.*]]){{[^#]*}} #[[ATTR0:[0-9]+]] {
+; CHECK-SAME: (){{[^#]*}} #[[ATTR0:[0-9]+]] {
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[TEMP:%.*]] = add i32 [[B_0]], 1
-; CHECK-NEXT:    store i32 [[TEMP]], ptr [[DUMMY]], align 4
+; CHECK-NEXT:    store i32 2, ptr @dummy, align 4
 ; CHECK-NEXT:    ret void
 ;
 entry:
@@ -27,7 +26,7 @@ define i32 @test(ptr %X) {
 ; CHECK-LABEL: define {{[^@]+}}@test
 ; CHECK-SAME: (ptr {{[^%]*}} [[X:%.*]]){{[^#]*}} #[[ATTR1:[0-9]+]] {
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    tail call {{.*}}void @f(i32 1)
+; CHECK-NEXT:    tail call fastcc void @f()
 ; CHECK-NEXT:    ret i32 0
 ;
 entry: