[llvm] [ctx_prof] Add Inlining support (PR #106154)
Mircea Trofin via llvm-commits
llvm-commits at lists.llvm.org
Fri Aug 30 13:42:57 PDT 2024
================
@@ -2116,6 +2119,203 @@ inlineRetainOrClaimRVCalls(CallBase &CB, objcarc::ARCInstKind RVCallKind,
}
}
+// In contextual profiling, when an inline succeeds, we want to remap the
+// indices of the callee in the index space of the caller. We can't just leave
+// them as-is because the same callee may appear in other places in this caller
+// (other callsites), and its (callee's) counters and sub-contextual profile
+// tree would be potentially different.
+// Not all BBs of the callee may survive the opportunistic DCE InlineFunction
+// does (same goes for callsites in the callee).
+// We will return a pair of vectors, one for basic block IDs and one for
+// callsites. For such a vector V, V[Idx] will be -1 if the callee
+// instrumentation with index Idx did not survive inlining, and a new value
+// otherwise.
+// This function will update the instrumentation intrinsics accordingly,
+// mapping indices as described above. We also replace the "name" operand
+// because we use it to distinguish between "own" instrumentation and "from
+// callee" instrumentation when performing the traversal of the CFG of the
+// caller. We traverse depth-first from the callsite's BB and up to the point we
+// hit owned BBs.
+// The return values will be then used to update the contextual
+// profile. Note: we only update the "name" and "index" operands in the
+// instrumentation intrinsics, we leave the hash and total nr of indices as-is,
+// it's not worth updating those.
+static const std::pair<std::vector<int64_t>, std::vector<int64_t>>
+remapIndices(Function &Caller, BasicBlock *StartBB,
+ CtxProfAnalysis::Result &CtxProf, uint32_t CalleeCounters,
+ uint32_t CalleeCallsites) {
+ // We'll allocate a new ID to imported callsite counters and callsites. We're
+ // using -1 to indicate a counter we delete. Most likely the entry, for
+ // example, will be deleted - we don't want 2 IDs in the same BB, and the
+ // entry would have been cloned in the callsite's old BB.
+ std::vector<int64_t> CalleeCounterMap;
+ std::vector<int64_t> CalleeCallsiteMap;
+ CalleeCounterMap.resize(CalleeCounters, -1);
+ CalleeCallsiteMap.resize(CalleeCallsites, -1);
+
+ auto RewriteInstrIfNeeded = [&](InstrProfIncrementInst &Ins) -> bool {
+ if (Ins.getNameValue() == &Caller)
+ return false;
+ const auto OldID = static_cast<uint32_t>(Ins.getIndex()->getZExtValue());
+ if (CalleeCounterMap[OldID] == -1)
+ CalleeCounterMap[OldID] = CtxProf.allocateNextCounterIndex(Caller);
+ const auto NewID = static_cast<uint32_t>(CalleeCounterMap[OldID]);
+
+ Ins.setNameValue(&Caller);
+ Ins.setIndex(NewID);
+ return true;
+ };
+
+ auto RewriteCallsiteInsIfNeeded = [&](InstrProfCallsite &Ins) -> bool {
+ if (Ins.getNameValue() == &Caller)
+ return false;
+ const auto OldID = static_cast<uint32_t>(Ins.getIndex()->getZExtValue());
+ if (CalleeCallsiteMap[OldID] == -1)
+ CalleeCallsiteMap[OldID] = CtxProf.allocateNextCallsiteIndex(Caller);
+ const auto NewID = static_cast<uint32_t>(CalleeCallsiteMap[OldID]);
+
+ Ins.setNameValue(&Caller);
+ Ins.setIndex(NewID);
+ return true;
+ };
+
+ std::deque<BasicBlock *> Worklist;
+ DenseSet<const BasicBlock *> Seen;
+ // We will traverse the BBs starting from the callsite BB. The callsite BB
+ // will have at least a BB ID - maybe its own, and in any case the one coming
+ // from the cloned function's entry BB. The other BBs we'll start seeing from
+ // there on may or may not have BB IDs. BBs with IDs belonging to our caller
+ // are definitely not coming from the imported function and form a boundary
+ // past which we don't need to traverse anymore. BBs may have no
+ // instrumentation (because we originally inserted instrumentation as per
+ // MST), in which case we'll traverse past them. An invariant we'll keep is
+ // that a BB will have at most 1 BB ID. For example, in the callsite BB, we
+ // will delete the callee BB's instrumentation. This doesn't result in
+ // information loss: the entry BB of the caller will have the same count as
+ // the callsite's BB. At the end of this traversal, all the callee's
+ // instrumentation would be mapped into the caller's instrumentation index
+ // space. Some of the callee's counters may be deleted (as mentioned, this
+ // should result in no loss of information).
+ Worklist.push_back(StartBB);
+ while (!Worklist.empty()) {
+ auto *BB = Worklist.front();
+ Worklist.pop_front();
+ bool Changed = false;
+ auto *BBID = CtxProfAnalysis::getBBInstrumentation(*BB);
+ if (BBID) {
+ Changed |= RewriteInstrIfNeeded(*BBID);
+ // this may be the entryblock from the inlined callee, coming into a BB
+ // that didn't have instrumentation because of MST decisions. Let's make
+ // sure it's placed accordingly. This is a noop elsewhere.
+ BBID->moveBefore(&*BB->getFirstInsertionPt());
+ }
+ for (auto &I : llvm::make_early_inc_range(*BB)) {
+ if (auto *Inc = dyn_cast<InstrProfIncrementInst>(&I)) {
+ if (Inc != BBID) {
+ Inc->eraseFromParent();
+ Changed = true;
+ }
+ } else if (auto *CS = dyn_cast<InstrProfCallsite>(&I)) {
+ Changed |= RewriteCallsiteInsIfNeeded(*CS);
+ }
+ }
+ if (!BBID || Changed)
+ for (auto *Succ : successors(BB))
+ if (Seen.insert(Succ).second)
+ Worklist.push_back(Succ);
+ }
+ return {std::move(CalleeCounterMap), std::move(CalleeCallsiteMap)};
+}
+
+llvm::InlineResult llvm::InlineFunction(CallBase &CB, InlineFunctionInfo &IFI,
+ CtxProfAnalysis::Result &CtxProf,
+ bool MergeAttributes,
+ AAResults *CalleeAAR,
+ bool InsertLifetime,
+ Function *ForwardVarArgsTo) {
+ if (!CtxProf)
+ return InlineFunction(CB, IFI, MergeAttributes, CalleeAAR, InsertLifetime,
+ ForwardVarArgsTo);
+
+ auto &Caller = *CB.getCaller();
+ auto &Callee = *CB.getCalledFunction();
+ auto *StartBB = CB.getParent();
+
+ // Get some preliminary data about the callsite before it might get inlined.
+ // Inlining shouldn't delete the callee, but it's cleaner (and low-cost) to
+ // get this data upfront and rely less on InlineFunction's behavior.
+ const auto CalleeGUID = AssignGUIDPass::getGUID(Callee);
+ auto *CallsiteIDIns = CtxProfAnalysis::getCallsiteInstrumentation(CB);
+ const auto CallsiteID =
+ static_cast<uint32_t>(CallsiteIDIns->getIndex()->getZExtValue());
+
+ const auto NrCalleeCounters = CtxProf.getNrCounters(Callee);
+ const auto NrCalleeCallsites = CtxProf.getNrCallsites(Callee);
+
+ auto Ret = InlineFunction(CB, IFI, MergeAttributes, CalleeAAR, InsertLifetime,
+ ForwardVarArgsTo);
+ if (!Ret.isSuccess())
+ return Ret;
----------------
mtrofin wrote:
The contextual profile never shows a <caller, inlined-callee> pair. It shows <caller, callee> pairs. Whether the post-instrumentation inliner pass inlined or not functions is immaterial - because it'd be decisions about inlining on an instrumented binary - and also transparent - because the instrumentation mechanism doesn't care how we get from a callsite instrumentation to the entry BB of a function.
All that is about the instrumented run and explains what the profile contains: call chains (with the observation above). Here we're doing profile *use*, though. There might be confusion about "why instrumentation...": we carry around instrumentation so we can use it as a marker - to tie counter values back to BBs later, when we flatten and lower the profile.
So at this stage, any inlining decision may be unsuccessful for whatever legal reason an inlining could be unsuccessful, irrespective of contextual profiling or not (we mostly try to avoid trying to inline callsites that can't be inlined, but IIUC there are some cases that only the actual attempt uncovers)
https://github.com/llvm/llvm-project/pull/106154
More information about the llvm-commits
mailing list