[PATCH] D140908: [MemProf] Context disambiguation cloning pass [patch 1a/3]

Wed Mar 15 16:55:54 PDT 2023

snehasish added a comment.

Still working my way through `updateStackNodes`. Here are some of the comments I have so far --

For the IR based tests, I noticed some comments which describe how the code was compiled. I wonder if it makes sense to use synthetic IR (derived from the original source) to make the tests more readable and less brittle. I took the indirect.ll test and ran it through llvm-reduce. and found that the reduced IR was much easier to follow. I attached my test.sh script and the reduced IR for indirect.ll. To run the reducer, `llvm-reduce --test=test.sh ../path/to/indirect.ll`. Let me know what you think.

F26810333: reduced.ll <https://reviews.llvm.org/F26810333>

F26810332: test.sh <https://reviews.llvm.org/F26810332>

================
Comment at: llvm/include/llvm/Transforms/IPO/MemProfContextDisambiguation.h:29
+    : public PassInfoMixin<MemProfContextDisambiguation> {
+  /// Run the context disambiguator on \p TheModule, returns true if any changes
+  /// was made.
----------------
I think `\p` is for parameters but here the parameter name is just `M`?

https://www.doxygen.nl/manual/commands.html#cmdp

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:17
+// The transformations can be performed either directly on IR (regular LTO), or
+// on a ThinLTO index (and later applied to the IR during the ThinLTO backend).
+// Both types of LTO operate on a the same base graph representation, which
----------------
nit: I don't think we introduce the ThinLTO changes in this patch so maybe update this text in the follow on patches instead? 

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:297
+  void addStackNodesForMIB(ContextNode *AllocNode,
+                           CallStack<NodeT, IteratorT> StackContext,
+                           CallStack<NodeT, IteratorT> &CallsiteContext,
----------------
Should the StackContext be a reference too? On the other hand, these are iterators so perhaps neither should be since they are lightweight.

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:312
+  /// iteration.
+  std::map<FuncTy *, std::vector<CallInfo>> FuncToCallsWithMetadata;
+
----------------
Iterating on the contents of this map (L935) is non-deterministic since we use a pointer as key. It would be nice to avoid non-determinism by using a MapVector even though I don't think there is any externally visible effect.

For NodeToCallingFunc below, we don't iterate on the contents so I think that's fine. 

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:350
+
+  /// Get the stack id corresponding to the given Id or Index (for IR this will
+  /// return itself, for a summary index this will return the id recorded in the
----------------
Unclosed parenthesis?

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:398
+  ContextNode *
+  moveEdgeToNewCalleeClone(const std::shared_ptr<ContextEdge> &Edge,
+                           EdgeIter *CallerEdgeI = nullptr);
----------------
This method seems to be unused in this diff. If this is used in a subsequent diff then can we move it there? Also for `moveEdgeToExistingCalleeClone` which is only called by this method.

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:544
+  std::shared_ptr<ContextEdge> Edge = std::make_shared<ContextEdge>(
+      this, Caller, (uint8_t)AllocType, ContextIdSet);
+  CallerEdges.push_back(Edge);
----------------
Can we inline the `{ContextId}` list initializer here directly?

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:617
+  // Use LastContextId as a uniq id for MIB allocation nodes.
+  AllocNode->OrigStackOrAllocId = LastContextId;
+  // Alloc type should be updated as we add in the MIBs. We should assert
----------------
Are we marking the first AllocNode we encounter as a clone since
* LastContextId starts from 0
* LastContextId is incremented in addStackNodesForMIB (and duplicateContextIds)
* Comment at L183 indicates that 0 is used to identify a clone

A simple fix would be to start LastContextId from 1 in the constructor? I found the update to LastContextId hard to follow but I don't have a suggestion on how this could be made simpler. 

Also if this is a bug, then maybe a unit test to check the consistency of internal data structures might be useful.

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:935
+  DenseMap<uint64_t, std::vector<CallContextInfo>> StackIdToMatchingCalls;
+  for (auto &FuncEntry : FuncToCallsWithMetadata) {
+    auto *Func = FuncEntry.first;
----------------
We can use a structured binding here - `for (auto& [Func, CallsWithMetadata] : FuncToCallsWithMetadata)`

Also `auto& Call` below to avoid a copy of the pair.

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:965
+    if (Calls.size() == 1) {
+      auto &[Call, Ids, Func, SavedContextIds] = Calls[0];
+      if (Ids.size() == 1)
----------------
How about `auto& Ids = std::get<1>(Calls[0]);` since we don't need to unpack the other elements (in this if block)?

Same for functor passed to the std::sort function below.

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:1163
+      for (auto &I : BB) {
+        if (auto *MemProfMD = I.getMetadata(LLVMContext::MD_memprof)) {
+          FuncToCallsWithMetadata[&F].push_back(&I);
----------------
Can we check if I isa<CallInst> (and invoke etc) before querying for metadata? I think this is simpler but a type check might speed things up by avoiding linear metadata attachment scans[1].

[1] https://github.com/llvm/llvm-project/blob/main/llvm/lib/IR/Metadata.cpp#L1239

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:1167
+          CallStack<MDNode, MDNode::op_iterator> CallsiteContext(
+              I.getMetadata(LLVMContext::MD_callsite));
+          // Add all of the MIBs and their stack nodes.
----------------
Should we assert `I.getMetadata(LLVMContext::MD_callsite)` is not null?

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:1177
+          assert(AllocNode->AllocTypes != (uint8_t)AllocationType::None);
+          NodeToCallingFunc[AllocNode] = &F;
+          // Memprof and callsite metadata on memory allocations no longer
----------------
nit: we can move this bookkeeping to `addAllocNode` (similar to how we update `AllocationCallToContextNodeMap` in `addAllocNode`).

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:1190
+
+  if (DumpCCG) {
+    dbgs() << "CCG before updating call stack chains:\n";
----------------
I think we should move the logic from L1190 to L1206 (end of the constructor) to the process method. It was a bit surprising for me to the process method not actually do any processing .. :)

Also this would bring all the dump and export calls in one place to follow sequentially.

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:1200
+
+  // Strip off remaining callsite metadata, no longer needed.
+  for (auto &FuncEntry : FuncToCallsWithMetadata)
----------------
It would be nice to move this cleanup to the end even though it does not affect `handleCallsitesWithMultipleTargets`. 

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:1202
+  for (auto &FuncEntry : FuncToCallsWithMetadata)
+    for (auto Call : FuncEntry.second)
+      Call.call()->setMetadata(LLVMContext::MD_callsite, nullptr);
----------------
auto& to avoid a copy?

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:1257
+    return false;
+  auto CalleeVal = CB->getCalledOperand()->stripPointerCasts();
+  auto *CalleeFunc = dyn_cast<Function>(CalleeVal);
----------------
nit: This can be auto* too like the one below.

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:1355
+template <typename DerivedCCG, typename FuncTy, typename CallTy>
+void CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::print(
+    raw_ostream &OS) const {
----------------
Now that we have GraphTraits, can we use the graph iterators instead of our own DFS here?

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:1378
+static void checkNode(const ContextNode<DerivedCCG, FuncTy, CallTy> *Node,
+                      bool CheckEdges = false) {
+  // Node's context ids should be the union of both its callee and caller edge
----------------
I don't see CheckEdges set to true in this diff. I think we should remove it altogether or make it always check by default if we are calling verify. What do you think?

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:1429
+void CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::check() const {
+  DenseSet<const ContextNode *> Visited;
+  for (auto &Entry : AllocationCallToContextNodeMap) {
----------------
Now that we have GraphTraits, can we use the graph iterators instead of our own DFS here?

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:1443
+  using NodePtrTy = std::unique_ptr<ContextNode<DerivedCCG, FuncTy, CallTy>>;
+  static NodeRef GetNode(const NodePtrTy &P) { return P.get(); }
+
----------------
s/GetNode/getNode to conform to llvm style. I think it's only used in the nodes_* functions below so we should be good.

Same for GetCallee below.

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:1484
+    : public DefaultDOTGraphTraits {
+  DOTGraphTraits(bool isSimple = false) : DefaultDOTGraphTraits(isSimple) {}
+
----------------
s/isSimple/IsSimple?

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:1563
+      return "brown1";
+    else if (AllocTypes == (uint8_t)AllocationType::Cold)
+      return "cyan";
----------------
We can omit the "else" part of "else if" after the return statements.

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:1574
+  static std::string getNodeId(NodeRef Node) {
+    std::stringstream sstream;
+    sstream << std::hex << "N0x" << (unsigned long long)Node;
----------------
sstream and result variable names should be InitCaps?

================
Comment at: llvm/test/Transforms/MemProfContextDisambiguation/duplicate-context-ids2.ll:396
+^0 = module: (path: "<stdin>", hash: (0, 0, 0, 0, 0))
+^1 = gv: (name: "_Z2A2v", summaries: (function: (module: ^0, flags: (linkage: external, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 1, canAutoHide: 0), insts: 8, funcFlags: (readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0, noInline: 1, alwaysInline: 0, noUnwind: 0, mayThrow: 0, hasUnknownCall: 0, mustBeUnreachable: 0), calls: ((callee: ^6)), callsites: ((callee: ^6, clones: (0), stackIds: (11918633778629885638, 12693505993681534773)))))) ; guid = 745609562354838949
+^2 = gv: (name: "_Z1Fv", summaries: (function: (module: ^0, flags: (linkage: external, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 1, canAutoHide: 0), insts: 2, funcFlags: (readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0, noInline: 1, alwaysInline: 0, noUnwind: 0, mayThrow: 0, hasUnknownCall: 0, mustBeUnreachable: 0), calls: ((callee: ^6)), callsites: ((callee: ^6, clones: (0), stackIds: (13543580133643026784)))))) ; guid = 882063111217488233
----------------
I don't think the Thin-LTO summaries are used in this patch. Not opposed to leaving it in for the next patch. Just curious whether this was intentional.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140908/new/

https://reviews.llvm.org/D140908