[PATCH] D45758: [XRay][profiler] Part 3: Profile Collector Service

Keith via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed May 30 19:03:41 PDT 2018


kpw accepted this revision.
kpw added a comment.
This revision is now accepted and ready to land.

Aside from the discussion about a more invasive change to the way cooperative state transitions, this is fine. Thanks for writing the test for serialization.

As discussed that's a big enough patch it should live on its own. If you want to open a bug for it, that would also be great.



================
Comment at: compiler-rt/lib/xray/xray_profile_collector.cc:85
+
+  // The Path in this record is in reverse order.
+  PathArray *Path = nullptr;
----------------
dberris wrote:
> kpw wrote:
> > Reverse order is not as clear as "ordered from the deepest callee's funcId to the root caller's funcId" or something like that.
> > 
> > Is the funcId for the Node also also in the array or is it elided?
> The path is just the function id's of the function in a stack trace. This allows us to identify each call stack uniquely, in leaf-to-root order.
Yep. Makes sense. You can mark this as done.


================
Comment at: compiler-rt/lib/xray/xray_profile_collector.cc:144-148
+    // Add the sentinel here.
+    constexpr int32_t SentinelFId = 0;
+    NextPtr = static_cast<char *>(
+                  internal_memset(NextPtr, SentinelFId, sizeof(SentinelFId))) +
+              sizeof(SentinelFId);
----------------
dberris wrote:
> kpw wrote:
> > Does PathArray have a constant time size method? I've never regretted a length-prefix encoding compared to sentinel termination where possible.
> > 
> > Edit: Just checked xray_segmented_array.h and size is constant time. I think we can design a seekable format (seek by record) without space or time overhead and I'd encourage a seekable format as it's more flexible.
> Yes, PathArray is an `__xray::Array<...>` which stores the size as a member, so O(1) check on the size.
> 
> I thought about always having size prefixes, but that makes it much harder to write -- i.e. reserving space for the size of the data before writing it. The value for a seekable format isn't as high at this time, we just need a way to store this in a reasonably compact format, and assume it will be read in a streaming fashion.
I find the value in prefix length encodings that it's easier to find bugs with a codec that gets out of sync, but agree it's not high value at this time.


================
Comment at: compiler-rt/lib/xray/xray_profile_collector.h:48
+/// process their own thread-local FunctionCallTrie instances, we're removing
+/// the need for synchronisation across threads while we're profiling.
+/// However, once we're done profiling, we can then collect copies of these
----------------
dberris wrote:
> kpw wrote:
> > This pushes the need to synchronize into the caller that controls when the state can change from FINALIZED to INITIALIZED.
> > 
> > It's too big of a change to introduce a new state, but we should either.
> > 1. Point out that this synchronization is necessary by the caller in this documentation or
> > 2. Have the __xray_init or whatever calls fail when this processing is occuring.
> We already do this -- the "caller" in this case will be the XRay implementation that's using this service. In particular, the only user of this is the profiling mode implementation.
> 
> In particular, this function doesn't interfere with any of the higher level states -- all the synchronisation it needs is internal. The documentation is saying that callers of this function only need to worry about synchronising the FunctionCallTrie it has access to, if it needs to. The implementation we have in the profiling mode uses thread-local FunctionCallTrie objects, which means from the thread that has access to that trie, it doesn't need to synchronise anything.
> 
> The next patch implements the finalization routine, which deals directly with this service's APIs "correctly".
> 
> Is there something else that will make this documentation more clear from that perspective?
I phrased my question poorly. I'm not worried about thread synchronization. I was trying to reason through whether there are illegal sequences of possible state transitions within a thread.

As we discussed over chat, each generation of xray state transitions manages thread local state.
__xray_init sets up state.
__xray_flush tears it down and calls post().

Since these state transitions only run when threads are intercepted and the handlers are invoked, we can have problems where no instrumented function is called for a thread when in XRAY_FINALIZED or similar.

As a follow on patch, we discussed having thread local generation data that tracks the state transitions and cooperative progress across the per-thread state machines that each handler function participates in.


https://reviews.llvm.org/D45758





More information about the llvm-commits mailing list