[PATCH] D140908: [MemProf] Context disambiguation cloning pass [patch 1a/3]
Snehasish Kumar via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Mar 15 16:55:54 PDT 2023
snehasish added a comment.
Still working my way through `updateStackNodes`. Here are some of the comments I have so far --
For the IR based tests, I noticed some comments which describe how the code was compiled. I wonder if it makes sense to use synthetic IR (derived from the original source) to make the tests more readable and less brittle. I took the indirect.ll test and ran it through llvm-reduce. and found that the reduced IR was much easier to follow. I attached my test.sh script and the reduced IR for indirect.ll. To run the reducer, `llvm-reduce --test=test.sh ../path/to/indirect.ll`. Let me know what you think.
F26810333: reduced.ll <https://reviews.llvm.org/F26810333>
F26810332: test.sh <https://reviews.llvm.org/F26810332>
================
Comment at: llvm/include/llvm/Transforms/IPO/MemProfContextDisambiguation.h:29
+ : public PassInfoMixin<MemProfContextDisambiguation> {
+ /// Run the context disambiguator on \p TheModule, returns true if any changes
+ /// was made.
----------------
I think `\p` is for parameters but here the parameter name is just `M`?
https://www.doxygen.nl/manual/commands.html#cmdp
================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:17
+// The transformations can be performed either directly on IR (regular LTO), or
+// on a ThinLTO index (and later applied to the IR during the ThinLTO backend).
+// Both types of LTO operate on a the same base graph representation, which
----------------
nit: I don't think we introduce the ThinLTO changes in this patch so maybe update this text in the follow on patches instead?
================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:297
+ void addStackNodesForMIB(ContextNode *AllocNode,
+ CallStack<NodeT, IteratorT> StackContext,
+ CallStack<NodeT, IteratorT> &CallsiteContext,
----------------
Should the StackContext be a reference too? On the other hand, these are iterators so perhaps neither should be since they are lightweight.
================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:312
+ /// iteration.
+ std::map<FuncTy *, std::vector<CallInfo>> FuncToCallsWithMetadata;
+
----------------
Iterating on the contents of this map (L935) is non-deterministic since we use a pointer as key. It would be nice to avoid non-determinism by using a MapVector even though I don't think there is any externally visible effect.
For NodeToCallingFunc below, we don't iterate on the contents so I think that's fine.
================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:350
+
+ /// Get the stack id corresponding to the given Id or Index (for IR this will
+ /// return itself, for a summary index this will return the id recorded in the
----------------
Unclosed parenthesis?
================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:398
+ ContextNode *
+ moveEdgeToNewCalleeClone(const std::shared_ptr<ContextEdge> &Edge,
+ EdgeIter *CallerEdgeI = nullptr);
----------------
This method seems to be unused in this diff. If this is used in a subsequent diff then can we move it there? Also for `moveEdgeToExistingCalleeClone` which is only called by this method.
================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:544
+ std::shared_ptr<ContextEdge> Edge = std::make_shared<ContextEdge>(
+ this, Caller, (uint8_t)AllocType, ContextIdSet);
+ CallerEdges.push_back(Edge);
----------------
Can we inline the `{ContextId}` list initializer here directly?
================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:617
+ // Use LastContextId as a uniq id for MIB allocation nodes.
+ AllocNode->OrigStackOrAllocId = LastContextId;
+ // Alloc type should be updated as we add in the MIBs. We should assert
----------------
Are we marking the first AllocNode we encounter as a clone since
* LastContextId starts from 0
* LastContextId is incremented in addStackNodesForMIB (and duplicateContextIds)
* Comment at L183 indicates that 0 is used to identify a clone
A simple fix would be to start LastContextId from 1 in the constructor? I found the update to LastContextId hard to follow but I don't have a suggestion on how this could be made simpler.
Also if this is a bug, then maybe a unit test to check the consistency of internal data structures might be useful.
================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:935
+ DenseMap<uint64_t, std::vector<CallContextInfo>> StackIdToMatchingCalls;
+ for (auto &FuncEntry : FuncToCallsWithMetadata) {
+ auto *Func = FuncEntry.first;
----------------
We can use a structured binding here - `for (auto& [Func, CallsWithMetadata] : FuncToCallsWithMetadata)`
Also `auto& Call` below to avoid a copy of the pair.
================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:965
+ if (Calls.size() == 1) {
+ auto &[Call, Ids, Func, SavedContextIds] = Calls[0];
+ if (Ids.size() == 1)
----------------
How about `auto& Ids = std::get<1>(Calls[0]);` since we don't need to unpack the other elements (in this if block)?
Same for functor passed to the std::sort function below.
================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:1163
+ for (auto &I : BB) {
+ if (auto *MemProfMD = I.getMetadata(LLVMContext::MD_memprof)) {
+ FuncToCallsWithMetadata[&F].push_back(&I);
----------------
Can we check if I isa<CallInst> (and invoke etc) before querying for metadata? I think this is simpler but a type check might speed things up by avoiding linear metadata attachment scans[1].
[1] https://github.com/llvm/llvm-project/blob/main/llvm/lib/IR/Metadata.cpp#L1239
================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:1167
+ CallStack<MDNode, MDNode::op_iterator> CallsiteContext(
+ I.getMetadata(LLVMContext::MD_callsite));
+ // Add all of the MIBs and their stack nodes.
----------------
Should we assert `I.getMetadata(LLVMContext::MD_callsite)` is not null?
================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:1177
+ assert(AllocNode->AllocTypes != (uint8_t)AllocationType::None);
+ NodeToCallingFunc[AllocNode] = &F;
+ // Memprof and callsite metadata on memory allocations no longer
----------------
nit: we can move this bookkeeping to `addAllocNode` (similar to how we update `AllocationCallToContextNodeMap` in `addAllocNode`).
================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:1190
+
+ if (DumpCCG) {
+ dbgs() << "CCG before updating call stack chains:\n";
----------------
I think we should move the logic from L1190 to L1206 (end of the constructor) to the process method. It was a bit surprising for me to the process method not actually do any processing .. :)
Also this would bring all the dump and export calls in one place to follow sequentially.
================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:1200
+
+ // Strip off remaining callsite metadata, no longer needed.
+ for (auto &FuncEntry : FuncToCallsWithMetadata)
----------------
It would be nice to move this cleanup to the end even though it does not affect `handleCallsitesWithMultipleTargets`.
================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:1202
+ for (auto &FuncEntry : FuncToCallsWithMetadata)
+ for (auto Call : FuncEntry.second)
+ Call.call()->setMetadata(LLVMContext::MD_callsite, nullptr);
----------------
auto& to avoid a copy?
================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:1257
+ return false;
+ auto CalleeVal = CB->getCalledOperand()->stripPointerCasts();
+ auto *CalleeFunc = dyn_cast<Function>(CalleeVal);
----------------
nit: This can be auto* too like the one below.
================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:1355
+template <typename DerivedCCG, typename FuncTy, typename CallTy>
+void CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::print(
+ raw_ostream &OS) const {
----------------
Now that we have GraphTraits, can we use the graph iterators instead of our own DFS here?
================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:1378
+static void checkNode(const ContextNode<DerivedCCG, FuncTy, CallTy> *Node,
+ bool CheckEdges = false) {
+ // Node's context ids should be the union of both its callee and caller edge
----------------
I don't see CheckEdges set to true in this diff. I think we should remove it altogether or make it always check by default if we are calling verify. What do you think?
================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:1429
+void CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::check() const {
+ DenseSet<const ContextNode *> Visited;
+ for (auto &Entry : AllocationCallToContextNodeMap) {
----------------
Now that we have GraphTraits, can we use the graph iterators instead of our own DFS here?
================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:1443
+ using NodePtrTy = std::unique_ptr<ContextNode<DerivedCCG, FuncTy, CallTy>>;
+ static NodeRef GetNode(const NodePtrTy &P) { return P.get(); }
+
----------------
s/GetNode/getNode to conform to llvm style. I think it's only used in the nodes_* functions below so we should be good.
Same for GetCallee below.
================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:1484
+ : public DefaultDOTGraphTraits {
+ DOTGraphTraits(bool isSimple = false) : DefaultDOTGraphTraits(isSimple) {}
+
----------------
s/isSimple/IsSimple?
================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:1563
+ return "brown1";
+ else if (AllocTypes == (uint8_t)AllocationType::Cold)
+ return "cyan";
----------------
We can omit the "else" part of "else if" after the return statements.
================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:1574
+ static std::string getNodeId(NodeRef Node) {
+ std::stringstream sstream;
+ sstream << std::hex << "N0x" << (unsigned long long)Node;
----------------
sstream and result variable names should be InitCaps?
================
Comment at: llvm/test/Transforms/MemProfContextDisambiguation/duplicate-context-ids2.ll:396
+^0 = module: (path: "<stdin>", hash: (0, 0, 0, 0, 0))
+^1 = gv: (name: "_Z2A2v", summaries: (function: (module: ^0, flags: (linkage: external, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 1, canAutoHide: 0), insts: 8, funcFlags: (readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0, noInline: 1, alwaysInline: 0, noUnwind: 0, mayThrow: 0, hasUnknownCall: 0, mustBeUnreachable: 0), calls: ((callee: ^6)), callsites: ((callee: ^6, clones: (0), stackIds: (11918633778629885638, 12693505993681534773)))))) ; guid = 745609562354838949
+^2 = gv: (name: "_Z1Fv", summaries: (function: (module: ^0, flags: (linkage: external, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 1, canAutoHide: 0), insts: 2, funcFlags: (readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0, noInline: 1, alwaysInline: 0, noUnwind: 0, mayThrow: 0, hasUnknownCall: 0, mustBeUnreachable: 0), calls: ((callee: ^6)), callsites: ((callee: ^6, clones: (0), stackIds: (13543580133643026784)))))) ; guid = 882063111217488233
----------------
I don't think the Thin-LTO summaries are used in this patch. Not opposed to leaving it in for the next patch. Just curious whether this was intentional.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D140908/new/
https://reviews.llvm.org/D140908
More information about the llvm-commits
mailing list