[llvm-branch-commits] AlwaysInliner: A new inlining algorithm to interleave alloca promotion with inlines. (PR #145613)

Wed Jun 25 12:30:24 PDT 2025

================
@@ -129,6 +147,245 @@ bool AlwaysInlineImpl(
   return Changed;
 }
 
+/// Promote allocas to registers if possible.
+static void promoteAllocas(
+    Function *Caller, SmallPtrSetImpl<AllocaInst *> &AllocasToPromote,
+    function_ref<AssumptionCache &(Function &)> &GetAssumptionCache) {
+  if (AllocasToPromote.empty())
+    return;
+
+  SmallVector<AllocaInst *, 4> PromotableAllocas;
+  llvm::copy_if(AllocasToPromote, std::back_inserter(PromotableAllocas),
+                isAllocaPromotable);
+  if (PromotableAllocas.empty())
+    return;
+
+  DominatorTree DT(*Caller);
+  AssumptionCache &AC = GetAssumptionCache(*Caller);
+  PromoteMemToReg(PromotableAllocas, DT, &AC);
+  NumAllocasPromoted += PromotableAllocas.size();
+  // Emit a remark for the promotion.
+  OptimizationRemarkEmitter ORE(Caller);
+  DebugLoc DLoc = Caller->getEntryBlock().getTerminator()->getDebugLoc();
+  ORE.emit([&]() {
+    return OptimizationRemark(DEBUG_TYPE, "PromoteAllocas", DLoc,
+                              &Caller->getEntryBlock())
+           << "Promoting " << ore::NV("NumAlloca", PromotableAllocas.size())
+           << " allocas to SSA registers in function '"
+           << ore::NV("Function", Caller) << "'";
+  });
+  LLVM_DEBUG(dbgs() << "Promoted " << PromotableAllocas.size()
+                    << " allocas to registers in function " << Caller->getName()
+                    << "\n");
+}
+
+/// We use a different visitation order of functions here to solve a phase
+/// ordering problem. After inlining, a caller function may have allocas that
+/// were previously used for passing reference arguments to the callee that
+/// are now promotable to registers, using SROA/mem2reg. However if we just let
+/// the AlwaysInliner continue inlining everything at once, the later SROA pass
+/// in the pipeline will end up placing phis for these allocas into blocks along
+/// the dominance frontier which may extend further than desired (e.g. loop
+/// headers). This can happen when the caller is then inlined into another
+/// caller, and the allocas end up hoisted further before SROA is run.
+///
+/// Instead what we want is to try to do, as best as we can, is to inline leaf
+/// functions into callers, and then run PromoteMemToReg() on the allocas that
+/// were passed into the callee before it was inlined.
+///
+/// We want to do this *before* the caller is inlined into another caller
+/// because we want the alloca promotion to happen before its scope extends too
+/// far because of further inlining.
+///
+/// Here's a simple pseudo-example:
+/// outermost_caller() {
+///   for (...) {
+///     middle_caller();
+///   }
+/// }
+///
+/// middle_caller() {
+///   int stack_var;
+///   inner_callee(&stack_var);
+/// }
+///
+/// inner_callee(int *x) {
+///   // Do something with x.
+/// }
+///
+/// In this case, we want to inline inner_callee() into middle_caller() and
+/// then promote stack_var to a register before we inline middle_caller() into
+/// outermost_caller(). The regular always_inliner would inline everything at
+/// once, and then SROA/mem2reg would promote stack_var to a register but in
+/// the context of outermost_caller() which is not what we want.
----------------
aemerson wrote:

Yes the traversal order matters here, because for optimal codegen we want mem2reg to happen between the inner->middle and middle->outer inlines. If you don't the other way around mem2reg can't do anything until the final inner->outer inline and by that point it's too late.

For now I think only this promotion is a known issue, I don't know of general issues with simplification.

https://github.com/llvm/llvm-project/pull/145613