[PATCH] D45757: [XRay][profiler] Part 2: XRay Function Call Trie

Keith via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sun May 13 22:34:23 PDT 2018


kpw added a comment.

This was easier to review than I expected. I found it easier to follow than the allocator/array CL. Sorry for the long delay!



================
Comment at: compiler-rt/lib/xray/tests/unit/function_call_trie_test.cc:56
+
+TEST(FunctionCallTrieTest, MissingFunctionEntry) {
+  thread_local auto A = FunctionCallTrie::InitAllocators();
----------------
Might as well test MissingFunctionExit as well for symmetry.


================
Comment at: compiler-rt/lib/xray/tests/unit/function_call_trie_test.cc:83-91
+  const auto R0 = R[0];
+  ASSERT_NE(R0, nullptr);
+  EXPECT_EQ(R0->CallCount, 1u);
+  EXPECT_EQ(R0->CumulativeLocalTime, 1u);
+
+  const auto R1 = R[1];
+  ASSERT_NE(R1, nullptr);
----------------
The way this test is written, the two roots look exactly identical. There is a possible error case where the trie returns the same root twice that is undetected.

You might have one function spend more time so that this is detected. Then you might want the test to assert ignoring order. Up to you whether it's worth it.


================
Comment at: compiler-rt/lib/xray/tests/unit/function_call_trie_test.cc:140-142
+  EXPECT_EQ(F3.CumulativeLocalTime, 100);
+  EXPECT_EQ(F2.CumulativeLocalTime, 300);
+  EXPECT_EQ(F1.CumulativeLocalTime, 100);
----------------
Imho, this would be easier to interpret if the nodes captured CumulativeTreeTime instead of CumulativeLocalTime.

Then it would be
F3 -> 100
F2 -> 300
F1 -> 400


================
Comment at: compiler-rt/lib/xray/tests/unit/function_call_trie_test.cc:190-192
+  auto A = FunctionCallTrie::InitAllocators();
+  FunctionCallTrie T0(A);
+  FunctionCallTrie T1(A);
----------------
Is sharing an initialized allocator expected to be OK? Why not use the same allocator for FunctionCallTrie Merged below then?


================
Comment at: compiler-rt/lib/xray/tests/unit/function_call_trie_test.cc:218-220
+  EXPECT_EQ(R0.FId, 1);
+  EXPECT_EQ(R0.CallCount, 2);
+  EXPECT_EQ(R0.CumulativeLocalTime, 10);
----------------
Only check the root? Theres only two other nodes to check, might as well verify them.


================
Comment at: compiler-rt/lib/xray/xray_function_call_trie.h:37
+///
+/// If we visulaise this data structure, we'll find the following potential
+/// representation:
----------------
visulaise -> visualize


================
Comment at: compiler-rt/lib/xray/xray_function_call_trie.h:95
+  // standard library types in this header.
+  struct NodeRef {
+    Node *NodePtr;
----------------
There's no reference members here. Maybe call it NodeIdPair?


================
Comment at: compiler-rt/lib/xray/xray_function_call_trie.h:217
+    }
+  }; // namespace __xray
+
----------------
Don't think so. This looks like the end of "struct Allocators {" to me.


================
Comment at: compiler-rt/lib/xray/xray_function_call_trie.h:318
+      TopNode->CallCount++;
+      TopNode->CumulativeLocalTime += LocalTime - CumulativeTreeTime;
+      CumulativeTreeTime += LocalTime;
----------------
To me, it is less intuitive to capture CumulativeLocalTime than CumulativeTreeTime in each Node. With one you can compute the other given the callees, but if I'm analyzing function latencies, I care about time spent in the callees just as much.

Why did you make this choice?


================
Comment at: compiler-rt/lib/xray/xray_function_call_trie.h:336-337
+  //
+  // This operation will *not* destroy the state in `O`, and thus may cause some
+  // duplicate entries in `O` if it is not empty.
+  void deepCopyInto(FunctionCallTrie &O) const {
----------------
Should this be called with non-empty destinations? Can we just CHECK it is not. Having duplicate roots stinks.


================
Comment at: compiler-rt/lib/xray/xray_function_call_trie.h:357
+      Stack DFSStack(StackAllocator);
+      DFSStack.Append(NodeAndParent{Root, NewRoot});
+      while (!DFSStack.empty()) {
----------------
Do you have to handle the Allocator failing here with a null check? Are you just relying on those cases artificially pruning the traversal?


================
Comment at: compiler-rt/lib/xray/xray_function_call_trie.h:380
+  // data from the original (current) trie, along with all its callees.
+  void mergeInto(FunctionCallTrie &O) const {
+    for (const auto Root : getRoots()) {
----------------
What's the thread safety of this and deepCopy? It seems like they  shouldn't be called when functions are being intercepted. How can we make sure that invariant is preserved?


================
Comment at: compiler-rt/lib/xray/xray_function_call_trie.h:393-401
+      struct NodeAndTarget {
+        FunctionCallTrie::Node *OrigNode;
+        FunctionCallTrie::Node *TargetNode;
+      };
+      using Stack = Array<NodeAndTarget>;
+
+      typename Stack::AllocatorType StackAllocator(
----------------
Should this be pulled out of the root loop so we can use a single stack and array instead of 1 per root node?


================
Comment at: compiler-rt/lib/xray/xray_function_call_trie.h:408-409
+        DFSStack.trim(1);
+        NT.TargetNode->CallCount += NT.OrigNode->CallCount;
+        NT.TargetNode->CumulativeLocalTime += NT.OrigNode->CumulativeLocalTime;
+        for (const auto Callee : NT.OrigNode->Callees) {
----------------
You have some TODOs elsewhere to update the histograms. Might put one here as well.


================
Comment at: compiler-rt/lib/xray/xray_profiler_flags.cc:35
+#define XRAY_FLAG(Type, Name, DefaultValue, Description)                       \
+  RegisterFlag(P, #Name, Description, &F->Name);
+#include "xray_profiler_flags.inc"
----------------
Is the "#Name" intentional? I'm not very fluent in macros, but that popped out.


https://reviews.llvm.org/D45757





More information about the llvm-commits mailing list