[PATCH] D38073: [XRay][compiler-rt] Use pthread for initializing thread-local data

Dean Michael Berris via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Sep 20 18:15:09 PDT 2017


dberris added a comment.

Thanks for the review!



================
Comment at: lib/xray/xray_fdr_logging_impl.h:110
 // call so that it can be initialized on first use instead of as a global.
-struct ThreadLocalData {
+struct alignas(64) ThreadLocalData {
   BufferQueue::Buffer Buffer;
----------------
dblaikie wrote:
> Presumably this should use LLVM_ALIGNAS?
> 
> Is this related to the rest of the patch, if so, how? Is it for correctness or performance?
It's related in that we'd like to use aligned uninitialised storage to store one of these objects. We'd the buffer to not only be correctly sized but also aligned correctly, since we're going to interpret the buffer's address as a different type (punning? aliasing? I can never get the right terminology for that). The performance benefit is a side-effect of the correctness issue (since we have non-trivially destructible things in it).

BTW, I can't use LLVM_ALIGNAS because that seems to be undefined in compiler-rt. :/


================
Comment at: lib/xray/xray_fdr_logging_impl.h:145
+// the FDR logging implementation. The implementation details require a bit of
+// care to maintain, so document the semantics that we want to achieve here.
+//
----------------
dblaikie wrote:
> "so document the semantics that we want to achieve here" seems weird - that sounds like a patch note or feature request (& somewhat redundant, it feels like - the reader can probably see that the semantics are documented below), rather than a statement of fact.
> 
> "the semantics are documented below" or maybe "The following semantics are intended:" or skip this comment entirely? Not sure.
I like skipping it. Stream of consciousness comments at the end of the day don't come out very well. :)


================
Comment at: lib/xray/xray_fdr_logging_impl.h:175-183
+// We're doing this to avoid using a `thread_local` object that has a
+// non-trivial destructor, because the C++ runtime might call std::malloc(...)
+// to register calls to destructors. Deadlocks may arise when, for example, an
+// externally provided malloc implementation is XRay instrumented, and
+// initializing the thread-locals involves calling into malloc. A malloc
+// implementation that does global synchronization might be holding a lock for a
+// critical section, calling a function that might be XRay instrumented (and
----------------
dblaikie wrote:
> Maybe this goes into more detail than needed if the XRay runtime in general shouldn't be using the standard library for all these reasons - that's a general requirement that wouldn't need to be rejustified at every use case/implementation detail?
This is specifically for the function, since the original implementation (and the potentially more obvious solution) uses a thread-exit guard type (i.e. RAII but using a thread-local). Other parts of the implementation have avoided using memory allocation routines because of performance reasons but not entirely for correctness.


================
Comment at: lib/xray/xray_fdr_logging_impl.h:190-191
+  thread_local pthread_once_t key_once = PTHREAD_ONCE_INIT;
+  thread_local std::aligned_storage<sizeof(ThreadLocalData),
+                                    alignof(ThreadLocalData)>::type TLSBuffer;
+  // Ensure that we only actually ever do the pthread initialization once.
----------------
dblaikie wrote:
> Seems the goal in general was/is to move away from C++ standard library usage in the runtime library, right? Is there an attempt to pick and choose which parts can be relied upon, or is the overall goal to actually remove C++ standard library usage entirely, like other parts of compiler-rt?
> 
> If not, then maybe std::aligned_union<1, ThreadLocalData> might be marginally simpler here, but only marginally. (& I wouldn't say it's strictly better - since a union of only one thing is a bit strange)
I think I should document why we're using a thread-local uninitialized memory at all in this situation. :)

First, the type ThreadLocalData isn't trivially destructable. If this was a thread_local object, we end up with the same problem we had in the first place -- having a thread_local object with a non-trivial destructor may cause the C++ runtime to call std::malloc(...) and introduce a deadlock when initialising the implementation as well as when destroying the object at thread-exit time.

Second, having uninitialised buffer space makes the initialisation of the memory trivial. We do want uninitialised bytes because we don't need to zero out the bytes, since we're going to initialize it once using placement new anyway.

So unfortunately, this is the solution that works and fits within the requirements we laid out. Updated with a bit more documentation on specifically why we're using an uninitialised aligned buffer. :)

Removing the dependency on standard C++ library things that have runtime issues I think is a good goal, but is orthogonal to this particular change (since really we're just avoiding the use of `thread_local` non-trivially destructible objects).


https://reviews.llvm.org/D38073





More information about the llvm-commits mailing list