[PATCH] D45758: [XRay][profiler] Part 3: Profile Collector Service

Keith via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu May 17 00:54:16 PDT 2018


kpw added a comment.

My overall feedback about this code is that after reading it, it probably deserves another pass over just to focus on what allocates and frees and which state transitions are responsible for closing the loops.

This is an easy thing to get wrong or underspecify, so it might be worth putting your skeptic hat on and picking apart the contracts as well.

Otherwise, a few questions but I think it makes sense for the most part.



================
Comment at: compiler-rt/lib/xray/tests/unit/profile_collector_test.cc:72
+  auto B = profileCollectorService::nextBuffer({nullptr, 0});
+  ValidateBlock(B);
+}
----------------
Would be better (and more difficult) to validate the contents of the block in addition to the header. This doesn't verify that any of the profiling data is correct and consistent with the Trie.


================
Comment at: compiler-rt/lib/xray/xray_profile_collector.cc:71
+    Item->Trie = reinterpret_cast<FunctionCallTrie *>(
+        InternalAlloc(sizeof(FunctionCallTrie)));
+    new (Item->Trie) FunctionCallTrie(*GlobalAllocators);
----------------
Why should we be using InternalAlloc instead of GlobalAllocators for these? Seems counterintuitive but this is a lot of stuff to keep track of so I could be thinking about it wrong.


================
Comment at: compiler-rt/lib/xray/xray_profile_collector.cc:85
+
+  // The Path in this record is in reverse order.
+  PathArray *Path = nullptr;
----------------
Reverse order is not as clear as "ordered from the deepest callee's funcId to the root caller's funcId" or something like that.

Is the funcId for the Node also also in the array or is it elided?


================
Comment at: compiler-rt/lib/xray/xray_profile_collector.cc:104
+
+// We go through a FunctionCallTrie and traverse from the root, in DFS fashion,
+// to generate the path(s) and output the data.
----------------
Nit: Could be stated "We walk a depth first traversal of the FunctionCallTrie from the root to generate ..."


================
Comment at: compiler-rt/lib/xray/xray_profile_collector.cc:109-112
+    using StackArray = Array<const FunctionCallTrie::Node *>;
+    using StackAllocator = typename StackArray::AllocatorType;
+    StackAllocator StackAlloc(profilerFlags()->stack_allocator_max, 0);
+    StackArray DFSStack(StackAlloc);
----------------
This works equally well with the stack moved out of the loop, saving reallocation for each root without consequence for understanding.


================
Comment at: compiler-rt/lib/xray/xray_profile_collector.cc:144-148
+    // Add the sentinel here.
+    constexpr int32_t SentinelFId = 0;
+    NextPtr = static_cast<char *>(
+                  internal_memset(NextPtr, SentinelFId, sizeof(SentinelFId))) +
+              sizeof(SentinelFId);
----------------
Does PathArray have a constant time size method? I've never regretted a length-prefix encoding compared to sentinel termination where possible.

Edit: Just checked xray_segmented_array.h and size is constant time. I think we can design a seekable format (seek by record) without space or time overhead and I'd encourage a seekable format as it's more flexible.


================
Comment at: compiler-rt/lib/xray/xray_profile_collector.cc:160-164
+    // Add an end of record sentinel here.
+    constexpr uint64_t EndOfRecord = 0x0;
+    NextPtr = static_cast<char *>(
+                  internal_memset(NextPtr, EndOfRecord, sizeof(EndOfRecord))) +
+              sizeof(EndOfRecord);
----------------
Does having a record delimiter actually get us anything meaningful? The format is not seekable by record as is. If we could make it seekable without adding overhead that would be best, otherwise, what do you think about removing the delimiter since it's implied by the position of the end of function ids sentinel?

Edit: Block header has total size. That might be sufficient and we can decide whether to remove record delimiter or keep it as sanity check framing.


================
Comment at: compiler-rt/lib/xray/xray_profile_collector.cc:218
+    Buffer->Size = sizeof(Header) + CumulativeSizes;
+    Buffer->Data = InternalAlloc(Buffer->Size, nullptr, 64);
+    DCHECK_NE(Buffer->Data, nullptr);
----------------
Looks like this is InternalFreed on the next call to serialize. It seems more simple to scope the allocation and free-ing to a single call to serialize. Why are we doing it the other way?

Edit: This is the exfiltration mechanism via the nextBuffer() function.


================
Comment at: compiler-rt/lib/xray/xray_profile_collector.cc:258-259
+XRayBuffer nextBuffer(XRayBuffer B) {
+  // FIXME: Find a way to identify the buffers, somehow, from within this
+  // function.
+  SpinMutexLock Lock(&GlobalMutex);
----------------
I don't understand what's meant here. :(


================
Comment at: compiler-rt/lib/xray/xray_profile_collector.h:33
+
+/// Posts the FunctionCallTrie assocaited with a specific Thread ID. This
+/// will:
----------------
assocaited ->associated


================
Comment at: compiler-rt/lib/xray/xray_profile_collector.h:48
+/// process their own thread-local FunctionCallTrie instances, we're removing
+/// the need for synchronisation across threads while we're profiling.
+/// However, once we're done profiling, we can then collect copies of these
----------------
This pushes the need to synchronize into the caller that controls when the state can change from FINALIZED to INITIALIZED.

It's too big of a change to introduce a new state, but we should either.
1. Point out that this synchronization is necessary by the caller in this documentation or
2. Have the __xray_init or whatever calls fail when this processing is occuring.


================
Comment at: compiler-rt/lib/xray/xray_profile_collector.h:67
+///   - list of records:
+///     - function ids in reverse call order, from leaf to root, terminated by
+///       0 (32 bits per function id)
----------------
"reverse call order" is unclear for a Trie. Do functions with multiple callees have duplicate entries? There are no parent pointers described here.

Edit: After reading through the CL, I realized I misunderstood the rest of the data (such as call count, etc.) to be associated with each entry in the PathArray, while in reality I think it's just the function ids. I think maybe this could be cleared up by using proto like language while keeping the indentation. E.g. 

repeated record:
   - repeated function_id ordered from callee to root:
   - call count
   - cumulative ...

There's nothing wrong with the current comment, but I misunderstood it and looking to clarify.


https://reviews.llvm.org/D45758





More information about the llvm-commits mailing list