[PATCH] D140908: [MemProf] Context disambiguation cloning pass [patch 1a/3]

Mon Mar 20 14:31:24 PDT 2023

snehasish accepted this revision.
snehasish added a comment.
This revision is now accepted and ready to land.

lgtm, overall. The remaining comments are minor and thanks for the explanations in the tests.

================
Comment at: llvm/include/llvm/Transforms/IPO/MemProfContextDisambiguation.h:22
+#include "llvm/IR/PassManager.h"
+#include <functional>
+
----------------
This header seems unused in this diff?

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:17
+// The transformations can be performed either directly on IR (regular LTO), or
+// on a ThinLTO index (and later applied to the IR during the ThinLTO backend).
+// Both types of LTO operate on a the same base graph representation, which
----------------
tejohnson wrote:
> snehasish wrote:
> > nit: I don't think we introduce the ThinLTO changes in this patch so maybe update this text in the follow on patches instead? 
> Instead of removing the ThinLTO comment, I added "(eventually)" in front of it. Otherwise the CRTP design choice doesn't make sense.
Yes, that sounds good to me.

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:617
+  // Use LastContextId as a uniq id for MIB allocation nodes.
+  AllocNode->OrigStackOrAllocId = LastContextId;
+  // Alloc type should be updated as we add in the MIBs. We should assert
----------------
tejohnson wrote:
> snehasish wrote:
> > Are we marking the first AllocNode we encounter as a clone since
> > * LastContextId starts from 0
> > * LastContextId is incremented in addStackNodesForMIB (and duplicateContextIds)
> > * Comment at L183 indicates that 0 is used to identify a clone
> > 
> > A simple fix would be to start LastContextId from 1 in the constructor? I found the update to LastContextId hard to follow but I don't have a suggestion on how this could be made simpler. 
> > 
> > Also if this is a bug, then maybe a unit test to check the consistency of internal data structures might be useful.
> The first AllocNode indeed will have 0 value here. However, that isn't a bug or in conflict with the comment, although I could see how it might be confusing. For alloc nodes we simply use LastContextId to give them unique ids that are only used in dot dumping (see the first sentence in the comment for OrigStackOrAllocId. For stack nodes this is the stack id corresponding to the node (i.e. from the metadata). The comment is just noting that clones will retain a 0 value for this field, but that it isn't an issue since we only use it during matching of callsite metadata (when the graph is being built). I rewrote the comment on L183, hopefully that clarifies things.
> 
> The reason why we increment LastContextId before using it in addStackNodesForMIB is that that is where we are using it for its main purpose - to assign a unique (non-zero) id to each allocation context from memprof metadata.
Thanks for the explanation. It makes sense to me with the updated comment.

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:1014
+
+      // Start after the last Id, which was handled once outside for all Calls.
+      for (auto IdIter = Ids.rbegin() + 1; IdIter != Ids.rend(); IdIter++) {
----------------
"after the last" sounds confusing, perhaps callout we are iterating backwards?

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:1129
+                                                 unsigned CloneNo) const {
+  return (Call->getFunction()->getName() + " -> " +
+          cast<CallBase>(Call)->getCalledFunction()->getName())
----------------
nit: Use Twine for concat?

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:1190
+
+  if (DumpCCG) {
+    dbgs() << "CCG before updating call stack chains:\n";
----------------
tejohnson wrote:
> snehasish wrote:
> > I think we should move the logic from L1190 to L1206 (end of the constructor) to the process method. It was a bit surprising for me to the process method not actually do any processing .. :)
> > 
> > Also this would bring all the dump and export calls in one place to follow sequentially.
> I would prefer not to do this, since updateStackNodes called below is part of the graph building and should be here in the graph constructor.
> 
> Note that process() is currently empty only because the graph transformations were split into patches 2 and 3.
Got it, as I worked my way through updateStackNodes I realized that it's still just set up.

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:1494
+        (Twine("OrigId: ") + (Node->IsAllocation ? "Alloc" : "") +
+         std::to_string(Node->OrigStackOrAllocId))
+            .str();
----------------
std::to_string(int) can be replaced with Twine(int) directly to avoid redundant conversions. Also in other locations apart from this one.

https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/ADT/Twine.h#L347

================
Comment at: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:1513
+  static std::string getNodeAttributes(NodeRef Node, GraphType) {
+    std::string AttributeString = (Twine("tooltip=\"") + getNodeId(Node) + " " +
+                                   getContextIds(Node->ContextIds) + "\"")
----------------
Maybe just declare AttributeString as Twine and change the return statement to `return AttributeString.str();` ?

Similar pattern of usage below too if choose to update this one.

================
Comment at: llvm/test/Transforms/MemProfContextDisambiguation/inlined2.ll:9
+;;
+;; char *bar() __attribute((noinline)) {
+;;   return new char[10];
----------------
Should this have trailing underscores too? i.e `__attribute__((noilnine))`

================
Comment at: llvm/test/Transforms/MemProfContextDisambiguation/inlined2.ll:56
+entry:
+  %call = call noalias noundef nonnull dereferenceable(10) ptr @_Znam(i64 noundef 10) #7, !memprof !7, !callsite !12, !heapallocsite !13
+  ret ptr %call
----------------
I guess once we reduce the tests, the heapallocsite metadata will be dropped.

On a somewhat related note, should we update the code below (in a separate patch) to drop memprof metadata too?
https://github.com/llvm/llvm-project/blob/main/llvm/lib/IR/DebugInfo.cpp#L852

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140908/new/

https://reviews.llvm.org/D140908