[PATCH] D38073: [XRay][compiler-rt] Use pthread for initializing thread-local data

David Blaikie via llvm-commits llvm-commits at lists.llvm.org
Wed Sep 27 10:47:30 PDT 2017


On Wed, Sep 20, 2017 at 6:15 PM Dean Michael Berris via Phabricator <
reviews at reviews.llvm.org> wrote:

> dberris added a comment.
>
> Thanks for the review!
>
>
>
> ================
> Comment at: lib/xray/xray_fdr_logging_impl.h:110
>  // call so that it can be initialized on first use instead of as a global.
> -struct ThreadLocalData {
> +struct alignas(64) ThreadLocalData {
>    BufferQueue::Buffer Buffer;
> ----------------
> dblaikie wrote:
> > Presumably this should use LLVM_ALIGNAS?
> >
> > Is this related to the rest of the patch, if so, how? Is it for
> correctness or performance?
> It's related in that we'd like to use aligned uninitialised storage to
> store one of these objects. We'd the buffer to not only be correctly sized
> but also aligned correctly,

since we're going to interpret the buffer's address as a different type
> (punning? aliasing?


Not sure I follow, the buffer:

 thread_local std::aligned_storage<sizeof(ThreadLocalData),
                                    alignof(ThreadLocalData)>::type
TLSBuffer;

is requested to be of the appropriate alignment for ThreadLocalData? So why
would ThreadLocalData need to request any specific alignment itself?

Or do you mean the contents of the Buffer member will be used to store
other things? Then perhaps it'd be more appropriate to put the alignment
attribute on the member? (& an assert somewhere that compares the alignment
of the thing being stored with the alignment of the buffer itself).


> I can never get the right terminology for that). The performance benefit
> is a side-effect of the correctness issue (since we have non-trivially
> destructible things in it).
>
> BTW, I can't use LLVM_ALIGNAS because that seems to be undefined in
> compiler-rt. :/
>

Is using the attribute directly sufficiently portable for compiler-rt? Are
there other uses of alignment in compiler-rt to compare/check against?


>
>
> ================
> Comment at: lib/xray/xray_fdr_logging_impl.h:145
> +// the FDR logging implementation. The implementation details require a
> bit of
> +// care to maintain, so document the semantics that we want to achieve
> here.
> +//
> ----------------
> dblaikie wrote:
> > "so document the semantics that we want to achieve here" seems weird -
> that sounds like a patch note or feature request (& somewhat redundant, it
> feels like - the reader can probably see that the semantics are documented
> below), rather than a statement of fact.
> >
> > "the semantics are documented below" or maybe "The following semantics
> are intended:" or skip this comment entirely? Not sure.
> I like skipping it. Stream of consciousness comments at the end of the day
> don't come out very well. :)
>
>
> ================
> Comment at: lib/xray/xray_fdr_logging_impl.h:175-183
> +// We're doing this to avoid using a `thread_local` object that has a
> +// non-trivial destructor, because the C++ runtime might call
> std::malloc(...)
> +// to register calls to destructors. Deadlocks may arise when, for
> example, an
> +// externally provided malloc implementation is XRay instrumented, and
> +// initializing the thread-locals involves calling into malloc. A malloc
> +// implementation that does global synchronization might be holding a
> lock for a
> +// critical section, calling a function that might be XRay instrumented
> (and
> ----------------
> dblaikie wrote:
> > Maybe this goes into more detail than needed if the XRay runtime in
> general shouldn't be using the standard library for all these reasons -
> that's a general requirement that wouldn't need to be rejustified at every
> use case/implementation detail?
> This is specifically for the function, since the original implementation
> (and the potentially more obvious solution) uses a thread-exit guard type
> (i.e. RAII but using a thread-local). Other parts of the implementation
> have avoided using memory allocation routines because of performance
> reasons but not entirely for correctness.
>
>
> ================
> Comment at: lib/xray/xray_fdr_logging_impl.h:190-191
> +  thread_local pthread_once_t key_once = PTHREAD_ONCE_INIT;
> +  thread_local std::aligned_storage<sizeof(ThreadLocalData),
> +                                    alignof(ThreadLocalData)>::type
> TLSBuffer;
> +  // Ensure that we only actually ever do the pthread initialization once.
> ----------------
> dblaikie wrote:
> > Seems the goal in general was/is to move away from C++ standard library
> usage in the runtime library, right? Is there an attempt to pick and choose
> which parts can be relied upon, or is the overall goal to actually remove
> C++ standard library usage entirely, like other parts of compiler-rt?
> >
> > If not, then maybe std::aligned_union<1, ThreadLocalData> might be
> marginally simpler here, but only marginally. (& I wouldn't say it's
> strictly better - since a union of only one thing is a bit strange)
> I think I should document why we're using a thread-local uninitialized
> memory at all in this situation. :)
>
> First, the type ThreadLocalData isn't trivially destructable. If this was
> a thread_local object,


I wasn't intending to suggest using a thread_local object, but replacing
the aligned_storage with aligned_union?


> we end up with the same problem we had in the first place -- having a
> thread_local object with a non-trivial destructor may cause the C++ runtime
> to call std::malloc(...) and introduce a deadlock when initialising the
> implementation as well as when destroying the object at thread-exit time.
>
> Second, having uninitialised buffer space makes the initialisation of the
> memory trivial. We do want uninitialised bytes because we don't need to
> zero out the bytes, since we're going to initialize it once using placement
> new anyway.
>

I would imagine the union's probably uninitialized too?


>
> So unfortunately, this is the solution that works and fits within the
> requirements we laid out. Updated with a bit more documentation on
> specifically why we're using an uninitialised aligned buffer. :)
>
> Removing the dependency on standard C++ library things that have runtime
> issues I think is a good goal, but is orthogonal to this particular change
> (since really we're just avoiding the use of `thread_local` non-trivially
> destructible objects).
>

Is that philosophy consistent with the rest of compiler-rt? Picking bits of
the standard library that are known/expected not to allocate or use other
'interesting' functions?


>
>
> https://reviews.llvm.org/D38073
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170927/5cf7f93f/attachment.html>


More information about the llvm-commits mailing list