[flang-commits] [flang] [llvm] [Pipeline] Add IPSCCPPass at O3 to fix function specialization phase ordering (PR #201543)

Thu Jun 4 03:27:24 PDT 2026

llvmorg-github-actions[bot] wrote:




@llvm/pr-subscribers-function-specialization

Author: anoopkg6

<details>
<summary>Changes</summary>

      
Add a second IPSCCPPass with function specialization enabled at -O3 directly after the inliner pipeline during non-LTO phases. This addresses a target-independent phase ordering issue stemming from Fortran's pass-by-reference semantics.

Introducing IPSCCPPass after buildInlinerPipeline allows LoopRotate to rewrite the control flow topology from a while-loop to a do-while layout. This structural inversion duplicates the first execution of the loop body and pulls it outside the cyclic boundaries into a single-predecessor entry block, splitting the critical edge. Memory disambiguation then resolves in straight-line code, allowing the late IPSCCP run to successfully evaluate constant initializations that were previously blocked.

Cross-compilation check using --target=x86_64-unknown-linux-gnu confirms that this phase ordering issue is target-independent. Function specialization fails to trigger without this pipeline modification.

Performance Impact on SystemZ (548.exchange2_r):
- Patched (Before addVectorPasses): ~56% performance improvement.
- Delayed Placement (After addVectorPasses): Only ~16% improvement.

Compile-Time & Run-Time Impact on SystemZ (SPEC CPU 2017):
- Total Suite Runtime: ~1.75% geometric mean improvement.
- Total Suite Compile-Time: ~0% impact (-0.09% geometric mean delta).

Placing the pass early delivers the full optimization gain, whereas placing it after vectorization drops the gain significantly, proving that earlier execution is critical to avoid degrading downstream vectorization.

---
Full diff: https://github.com/llvm/llvm-project/pull/201543.diff


5 Files Affected:

- (added) flang/test/Driver/funcspec-phase-ordering-loop-rotate-ipsccp.f90 (+50) 
- (modified) llvm/lib/Passes/PassBuilderPipelines.cpp (+6) 
- (modified) llvm/test/Other/new-pm-defaults.ll (+4-1) 
- (added) llvm/test/Transforms/FunctionSpecialization/function-specialization-loop-rotate-ipsccp.ll (+48) 
- (modified) llvm/test/Transforms/PhaseOrdering/dce-after-argument-promotion.ll (+3-4) 


``````````diff

diff --git a/flang/test/Driver/funcspec-phase-ordering-loop-rotate-ipsccp.f90 b/flang/test/Driver/funcspec-phase-ordering-loop-rotate-ipsccp.f90
new file mode 100644
index 0000000000000..b18553b3bfd4b
--- /dev/null
+++ b/flang/test/Driver/funcspec-phase-ordering-loop-rotate-ipsccp.f90
@@ -0,0 +1,50 @@
+! RUN: %flang -O2 -mllvm -force-specialization -Xflang -fdebug-pass-manager -S -emit-llvm %s -o %t.O2.ll 2>%t.O2.stderr
+! RUN: FileCheck %s --check-prefix=PASS-O2-PIPE < %t.O2.stderr
+! RUN: FileCheck %s --check-prefix=PASS-O2-IR < %t.O2.ll
+!
+! RUN: %flang -O3 -mllvm -force-specialization -Xflang -fdebug-pass-manager -S -emit-llvm %s -o %t.O3.ll 2>%t.O3.stderr
+! RUN: FileCheck %s --check-prefix=PASS-O3-PIPE < %t.O3.stderr
+! RUN: FileCheck %s --check-prefix=PASS-O3-IR < %t.O3.ll
+
+module brute_force
+  implicit none
+  integer, public :: fallback_sink
+  integer, volatile, public :: global_fence = 1
+contains
+  subroutine top_level_caller()
+    integer :: temp
+    temp = 2
+    call digits_2(temp)
+  end subroutine top_level_caller
+
+  recursive subroutine digits_2(arg1)
+    integer, intent(in) :: arg1
+    integer :: temp_inner
+
+    if (global_fence == 1) then
+      if (arg1 == 2) then
+        temp_inner = arg1
+        call digits_2(temp_inner)
+      else
+        fallback_sink = arg1 * 5 + 12
+      end if
+    end if
+  end subroutine digits_2
+end module brute_force
+
+! PASS-O2-PIPE:      Running pass: IPSCCPPass on [module]
+! PASS-O2-PIPE:      Running pass: InlinerPass on (
+! PASS-O2-PIPE-NOT:  Running pass: IPSCCPPass on [module]
+! PASS-O2-PIPE:      Running pass: DeadArgumentEliminationPass on [module]
+
+! PASS-O2-IR-NOT:    .specialized.
+
+! PASS-O3-PIPE:      Running pass: IPSCCPPass on [module]
+! PASS-O3-PIPE:      Running pass: InlinerPass on (
+! PASS-O3-PIPE:      Running pass: IPSCCPPass on [module]
+! PASS-O3-PIPE:      Running pass: DeadArgumentEliminationPass on [module]
+
+! PASS-O3-IR:        define void @{{.*}}top_level_caller
+! PASS-O3-IR:        call fastcc void @{{.*}}digits_2{{.*}}.specialized.{{.*}}()
+! PASS-O3-IR:        define internal fastcc void @{{.*}}digits_2{{.*}}.specialized.1()
+! PASS-O3-IR:        define internal fastcc void @{{.*}}digits_2{{.*}}.specialized.2()
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 9eea552fd263e..2b0693e0d3248 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -1323,6 +1323,12 @@ PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,
   else
     MPM.addPass(buildInlinerPipeline(Level, Phase));
 
+  // Run a second IPSCCP pass at O3 to resolve function specialization
+  // opportunities exposed by loop rotation reference parameters
+  // (e.g., Fortran pass-by-reference).
+  if (Level.getSpeedupLevel() >= 3 && Phase == ThinOrFullLTOPhase::None)
+    MPM.addPass(llvm::IPSCCPPass(IPSCCPOptions(/*AllowFuncSpec=*/true)));
+
   // Remove any dead arguments exposed by cleanups, constant folding globals,
   // and argument promotion.
   MPM.addPass(DeadArgumentEliminationPass());
diff --git a/llvm/test/Other/new-pm-defaults.ll b/llvm/test/Other/new-pm-defaults.ll
index d6e51f451c3a4..ce470d26ab234 100644
--- a/llvm/test/Other/new-pm-defaults.ll
+++ b/llvm/test/Other/new-pm-defaults.ll
@@ -220,7 +220,10 @@
 ; CHECK-O-NEXT: Running pass: InvalidateAnalysisPass<{{.*}}ShouldNotRunFunctionPassesAnalysis
 ; CHECK-O-NEXT: Invalidating analysis: ShouldNotRunFunctionPassesAnalysis
 ; CHECK-O-NEXT: Invalidating analysis: InlineAdvisorAnalysis
-; CHECK-O-NEXT: Running pass: DeadArgumentEliminationPass
+; CHECK-O1-NEXT: Running pass: DeadArgumentEliminationPass
+; CHECK-O2-NEXT: Running pass: DeadArgumentEliminationPass
+; CHECK-O3-NOEXT: Running pass: IPSCCPPass
+; CHECK-O3: Running pass: DeadArgumentEliminationPass
 ; CHECK-O-NEXT: Running pass: CoroCleanupPass
 ; CHECK-O-NEXT: Running pass: GlobalOptPass
 ; CHECK-O-NEXT: Running pass: GlobalDCEPass
diff --git a/llvm/test/Transforms/FunctionSpecialization/function-specialization-loop-rotate-ipsccp.ll b/llvm/test/Transforms/FunctionSpecialization/function-specialization-loop-rotate-ipsccp.ll
new file mode 100644
index 0000000000000..d80c69344b8c0
--- /dev/null
+++ b/llvm/test/Transforms/FunctionSpecialization/function-specialization-loop-rotate-ipsccp.ll
@@ -0,0 +1,48 @@
+; RUN: opt -S -passes='ipsccp,function(loop(loop-rotate))' -force-specialization < %s | FileCheck %s --check-prefix=NO-LATE-IPSCCP
+; RUN: opt -S -passes='ipsccp,function(loop(loop-rotate)),ipsccp' -force-specialization < %s | FileCheck %s --check-prefix=WITH-LATE-IPSCCP
+
+ at external_cond = external global i1, align 1
+
+define void @top_level_caller() {
+; WITH-LATE-IPSCCP-LABEL: define void @top_level_caller(
+; WITH-LATE-IPSCCP:        call void @digits_2.specialized.{{[0-9]+}}(
+
+; NO-LATE-IPSCCP-LABEL:   define void @top_level_caller(
+; NO-LATE-IPSCCP-NOT:       @digits_2.specialized
+entry:
+  %temp = alloca i32, align 4
+  store i32 2, ptr %temp, align 4
+  %cond = load i1, ptr @external_cond, align 1
+  %idx = select i1 %cond, i32 0, i32 99
+  call void @digits_2(ptr %temp, i32 %idx)
+  ret void
+}
+
+define internal void @digits_2(ptr %arg1, i32 %loop_idx) {
+; NO-LATE-IPSCCP-LABEL: define internal void @digits_2(
+; NO-LATE-IPSCCP:        %val1 = load i32, ptr %arg1
+; NO-LATE-IPSCCP:        br label %loop.body
+
+; WITH-LATE-IPSCCP-LABEL: define internal void @digits_2.specialized.{{[0-9]+}}(
+entry:
+  %temp_alloc = alloca i32, align 4
+  br label %loop.header
+
+loop.header:
+  %ptr_input = phi ptr [ %arg1, %entry ], [ %arrayidx, %loop.body ]
+  %iv = phi i32 [ %loop_idx, %entry ], [ %iv.next, %loop.body ]
+  %val = load i32, ptr %ptr_input, align 4
+  %cmp = icmp slt i32 %iv, 1
+  br i1 %cmp, label %loop.body, label %exit
+
+loop.body:
+  %idxprom = sext i32 %iv to i64
+  %arrayidx = getelementptr inbounds i32, ptr %ptr_input, i64 %idxprom
+  store i32 %val, ptr %arrayidx, align 4
+  %iv.next = add nsw i32 %iv, 1
+  call void @digits_2(ptr %temp_alloc, i32 %iv.next)
+  br label %loop.header
+
+exit:
+  ret void
+}
diff --git a/llvm/test/Transforms/PhaseOrdering/dce-after-argument-promotion.ll b/llvm/test/Transforms/PhaseOrdering/dce-after-argument-promotion.ll
index c33fcfbe6ed97..435112fccb744 100644
--- a/llvm/test/Transforms/PhaseOrdering/dce-after-argument-promotion.ll
+++ b/llvm/test/Transforms/PhaseOrdering/dce-after-argument-promotion.ll
@@ -9,10 +9,9 @@
 
 define internal void @f(ptr byval(%struct.ss) align 8 %b, ptr byval(i32) align 4 %X) noinline nounwind  {
 ; CHECK-LABEL: define {{[^@]+}}@f
-; CHECK-SAME: (i32 [[B_0:%.*]]){{[^#]*}} #[[ATTR0:[0-9]+]] {
+; CHECK-SAME: (){{[^#]*}} #[[ATTR0:[0-9]+]] {
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[TEMP:%.*]] = add i32 [[B_0]], 1
-; CHECK-NEXT:    store i32 [[TEMP]], ptr [[DUMMY]], align 4
+; CHECK-NEXT:    store i32 2, ptr @dummy, align 4
 ; CHECK-NEXT:    ret void
 ;
 entry:
@@ -27,7 +26,7 @@ define i32 @test(ptr %X) {
 ; CHECK-LABEL: define {{[^@]+}}@test
 ; CHECK-SAME: (ptr {{[^%]*}} [[X:%.*]]){{[^#]*}} #[[ATTR1:[0-9]+]] {
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    tail call {{.*}}void @f(i32 1)
+; CHECK-NEXT:    tail call fastcc void @f()
 ; CHECK-NEXT:    ret i32 0
 ;
 entry:

``````````

</details>


https://github.com/llvm/llvm-project/pull/201543