[llvm] [ctx_prof] Add Inlining support (PR #106154)

Mingming Liu via llvm-commits llvm-commits at lists.llvm.org
Thu Aug 29 21:10:45 PDT 2024


================
@@ -2116,6 +2119,203 @@ inlineRetainOrClaimRVCalls(CallBase &CB, objcarc::ARCInstKind RVCallKind,
   }
 }
 
+// In contextual profiling, when an inline succeeds, we want to remap the
+// indices of the callee in the index space of the caller. We can't just leave
+// them as-is because the same callee may appear in other places in this caller
+// (other callsites), and its (callee's) counters and sub-contextual profile
+// tree would be potentially different.
+// Not all BBs of the callee may survive the opportunistic DCE InlineFunction
+// does (same goes for callsites in the callee).
+// We will return a pair of vectors, one for basic block IDs and one for
+// callsites. For such a vector V, V[Idx] will be -1 if the callee
+// instrumentation with index Idx did not survive inlining, and a new value
+// otherwise.
+// This function will update the instrumentation intrinsics accordingly,
+// mapping indices as described above. We also replace the "name" operand
+// because we use it to distinguish between "own" instrumentation and "from
+// callee" instrumentation when performing the traversal of the CFG of the
+// caller. We traverse depth-first from the callsite's BB and up to the point we
+// hit owned BBs.
+// The return values will be then used to update the contextual
+// profile. Note: we only update the "name" and "index" operands in the
+// instrumentation intrinsics, we leave the hash and total nr of indices as-is,
+// it's not worth updating those.
+static const std::pair<std::vector<int64_t>, std::vector<int64_t>>
+remapIndices(Function &Caller, BasicBlock *StartBB,
+             CtxProfAnalysis::Result &CtxProf, uint32_t CalleeCounters,
+             uint32_t CalleeCallsites) {
+  // We'll allocate a new ID to imported callsite counters and callsites. We're
+  // using -1 to indicate a counter we delete. Most likely the entry, for
+  // example, will be deleted - we don't want 2 IDs in the same BB, and the
+  // entry would have been cloned in the callsite's old BB.
+  std::vector<int64_t> CalleeCounterMap;
+  std::vector<int64_t> CalleeCallsiteMap;
+  CalleeCounterMap.resize(CalleeCounters, -1);
+  CalleeCallsiteMap.resize(CalleeCallsites, -1);
+
+  auto RewriteInstrIfNeeded = [&](InstrProfIncrementInst &Ins) -> bool {
+    if (Ins.getNameValue() == &Caller)
+      return false;
+    const auto OldID = static_cast<uint32_t>(Ins.getIndex()->getZExtValue());
+    if (CalleeCounterMap[OldID] == -1)
+      CalleeCounterMap[OldID] = CtxProf.allocateNextCounterIndex(Caller);
+    const auto NewID = static_cast<uint32_t>(CalleeCounterMap[OldID]);
+
+    Ins.setNameValue(&Caller);
+    Ins.setIndex(NewID);
+    return true;
+  };
+
+  auto RewriteCallsiteInsIfNeeded = [&](InstrProfCallsite &Ins) -> bool {
+    if (Ins.getNameValue() == &Caller)
+      return false;
+    const auto OldID = static_cast<uint32_t>(Ins.getIndex()->getZExtValue());
+    if (CalleeCallsiteMap[OldID] == -1)
+      CalleeCallsiteMap[OldID] = CtxProf.allocateNextCallsiteIndex(Caller);
+    const auto NewID = static_cast<uint32_t>(CalleeCallsiteMap[OldID]);
+
+    Ins.setNameValue(&Caller);
+    Ins.setIndex(NewID);
+    return true;
+  };
+
+  std::deque<BasicBlock *> Worklist;
+  DenseSet<const BasicBlock *> Seen;
+  // We will traverse the BBs starting from the callsite BB. The callsite BB
+  // will have at least a BB ID - maybe its own, and in any case the one coming
+  // from the cloned function's entry BB. The other BBs we'll start seeing from
+  // there on may or may not have BB IDs. BBs with IDs belonging to our caller
+  // are definitely not coming from the imported function and form a boundary
+  // past which we don't need to traverse anymore. BBs may have no
+  // instrumentation (because we originally inserted instrumentation as per
+  // MST), in which case we'll traverse past them. An invariant we'll keep is
+  // that a BB will have at most 1 BB ID. For example, in the callsite BB, we
+  // will delete the callee BB's instrumentation. This doesn't result in
+  // information loss: the entry BB of the caller will have the same count as
+  // the callsite's BB. At the end of this traversal, all the callee's
+  // instrumentation would be mapped into the caller's instrumentation index
+  // space. Some of the callee's counters may be deleted (as mentioned, this
+  // should result in no loss of information).
+  Worklist.push_back(StartBB);
+  while (!Worklist.empty()) {
+    auto *BB = Worklist.front();
+    Worklist.pop_front();
+    bool Changed = false;
+    auto *BBID = CtxProfAnalysis::getBBInstrumentation(*BB);
+    if (BBID) {
+      Changed |= RewriteInstrIfNeeded(*BBID);
+      // this may be the entryblock from the inlined callee, coming into a BB
+      // that didn't have instrumentation because of MST decisions. Let's make
+      // sure it's placed accordingly. This is a noop elsewhere.
+      BBID->moveBefore(&*BB->getFirstInsertionPt());
+    }
+    for (auto &I : llvm::make_early_inc_range(*BB)) {
+      if (auto *Inc = dyn_cast<InstrProfIncrementInst>(&I)) {
+        if (Inc != BBID) {
+          Inc->eraseFromParent();
+          Changed = true;
+        }
+      } else if (auto *CS = dyn_cast<InstrProfCallsite>(&I)) {
+        Changed |= RewriteCallsiteInsIfNeeded(*CS);
+      }
+    }
+    if (!BBID || Changed)
+      for (auto *Succ : successors(BB))
+        if (Seen.insert(Succ).second)
+          Worklist.push_back(Succ);
+  }
+  return {std::move(CalleeCounterMap), std::move(CalleeCallsiteMap)};
+}
+
+llvm::InlineResult llvm::InlineFunction(CallBase &CB, InlineFunctionInfo &IFI,
+                                        CtxProfAnalysis::Result &CtxProf,
+                                        bool MergeAttributes,
+                                        AAResults *CalleeAAR,
+                                        bool InsertLifetime,
+                                        Function *ForwardVarArgsTo) {
+  if (!CtxProf)
+    return InlineFunction(CB, IFI, MergeAttributes, CalleeAAR, InsertLifetime,
+                          ForwardVarArgsTo);
+
+  auto &Caller = *CB.getCaller();
+  auto &Callee = *CB.getCalledFunction();
+  auto *StartBB = CB.getParent();
+
+  // Get some preliminary data about the callsite before it might get inlined.
+  // Inlining shouldn't delete the callee, but it's cleaner (and low-cost) to
+  // get this data upfront and rely less on InlineFunction's behavior.
+  const auto CalleeGUID = AssignGUIDPass::getGUID(Callee);
+  auto *CallsiteIDIns = CtxProfAnalysis::getCallsiteInstrumentation(CB);
+  const auto CallsiteID =
+      static_cast<uint32_t>(CallsiteIDIns->getIndex()->getZExtValue());
+
+  const auto NrCalleeCounters = CtxProf.getNrCounters(Callee);
+  const auto NrCalleeCallsites = CtxProf.getNrCallsites(Callee);
+
+  auto Ret = InlineFunction(CB, IFI, MergeAttributes, CalleeAAR, InsertLifetime,
+                            ForwardVarArgsTo);
+  if (!Ret.isSuccess())
+    return Ret;
----------------
minglotus-6 wrote:

naive question, what are the scenarios for un-successful inlines if contextual profile shows a <caller, inlined-callee> pair?

https://github.com/llvm/llvm-project/pull/106154


More information about the llvm-commits mailing list