[PATCH] D38073: [XRay][compiler-rt] Use pthread for initializing thread-local data

Dean Michael Berris via llvm-commits llvm-commits at lists.llvm.org
Wed Sep 27 17:36:14 PDT 2017


> On 28 Sep 2017, at 03:47, David Blaikie <dblaikie at gmail.com> wrote:
> 
> 
> 
> On Wed, Sep 20, 2017 at 6:15 PM Dean Michael Berris via Phabricator <reviews at reviews.llvm.org <mailto:reviews at reviews.llvm.org>> wrote:
> dberris added a comment.
> 
> Thanks for the review!
> 
> 
> 
> ================
> Comment at: lib/xray/xray_fdr_logging_impl.h:110
>  // call so that it can be initialized on first use instead of as a global.
> -struct ThreadLocalData {
> +struct alignas(64) ThreadLocalData {
>    BufferQueue::Buffer Buffer;
> ----------------
> dblaikie wrote:
> > Presumably this should use LLVM_ALIGNAS?
> >
> > Is this related to the rest of the patch, if so, how? Is it for correctness or performance?
> It's related in that we'd like to use aligned uninitialised storage to store one of these objects. We'd the buffer to not only be correctly sized but also aligned correctly, 
> since we're going to interpret the buffer's address as a different type (punning? aliasing?
> 
> Not sure I follow, the buffer:
> 
>  thread_local std::aligned_storage<sizeof(ThreadLocalData),
>                                     alignof(ThreadLocalData)>::type TLSBuffer;
> 
> is requested to be of the appropriate alignment for ThreadLocalData? So why would ThreadLocalData need to request any specific alignment itself?
> 

I thought you'd need the specific alignment so that pointers of the type would be properly aligned -- thus helping the compiler decide more appropriate aligned versions of the operations on members. If that's not the case, then that's fine.

> Or do you mean the contents of the Buffer member will be used to store other things? Then perhaps it'd be more appropriate to put the alignment attribute on the member? (& an assert somewhere that compares the alignment of the thing being stored with the alignment of the buffer itself).
>  
> I can never get the right terminology for that). The performance benefit is a side-effect of the correctness issue (since we have non-trivially destructible things in it).
> 
> BTW, I can't use LLVM_ALIGNAS because that seems to be undefined in compiler-rt. :/
> 
> Is using the attribute directly sufficiently portable for compiler-rt? Are there other uses of alignment in compiler-rt to compare/check against?
>  

It's C++11, and all of compiler-rt seems to be written with the requirement of C++11 so it's suitably portable that way. Removed it anyway.

> 
> 
> ================
> Comment at: lib/xray/xray_fdr_logging_impl.h:145
> +// the FDR logging implementation. The implementation details require a bit of
> +// care to maintain, so document the semantics that we want to achieve here.
> +//
> ----------------
> dblaikie wrote:
> > "so document the semantics that we want to achieve here" seems weird - that sounds like a patch note or feature request (& somewhat redundant, it feels like - the reader can probably see that the semantics are documented below), rather than a statement of fact.
> >
> > "the semantics are documented below" or maybe "The following semantics are intended:" or skip this comment entirely? Not sure.
> I like skipping it. Stream of consciousness comments at the end of the day don't come out very well. :)
> 
> 
> ================
> Comment at: lib/xray/xray_fdr_logging_impl.h:175-183
> +// We're doing this to avoid using a `thread_local` object that has a
> +// non-trivial destructor, because the C++ runtime might call std::malloc(...)
> +// to register calls to destructors. Deadlocks may arise when, for example, an
> +// externally provided malloc implementation is XRay instrumented, and
> +// initializing the thread-locals involves calling into malloc. A malloc
> +// implementation that does global synchronization might be holding a lock for a
> +// critical section, calling a function that might be XRay instrumented (and
> ----------------
> dblaikie wrote:
> > Maybe this goes into more detail than needed if the XRay runtime in general shouldn't be using the standard library for all these reasons - that's a general requirement that wouldn't need to be rejustified at every use case/implementation detail?
> This is specifically for the function, since the original implementation (and the potentially more obvious solution) uses a thread-exit guard type (i.e. RAII but using a thread-local). Other parts of the implementation have avoided using memory allocation routines because of performance reasons but not entirely for correctness.
> 
> 
> ================
> Comment at: lib/xray/xray_fdr_logging_impl.h:190-191
> +  thread_local pthread_once_t key_once = PTHREAD_ONCE_INIT;
> +  thread_local std::aligned_storage<sizeof(ThreadLocalData),
> +                                    alignof(ThreadLocalData)>::type TLSBuffer;
> +  // Ensure that we only actually ever do the pthread initialization once.
> ----------------
> dblaikie wrote:
> > Seems the goal in general was/is to move away from C++ standard library usage in the runtime library, right? Is there an attempt to pick and choose which parts can be relied upon, or is the overall goal to actually remove C++ standard library usage entirely, like other parts of compiler-rt?
> >
> > If not, then maybe std::aligned_union<1, ThreadLocalData> might be marginally simpler here, but only marginally. (& I wouldn't say it's strictly better - since a union of only one thing is a bit strange)
> I think I should document why we're using a thread-local uninitialized memory at all in this situation. :)
> 
> First, the type ThreadLocalData isn't trivially destructable. If this was a thread_local object,
> 
> I wasn't intending to suggest using a thread_local object, but replacing the aligned_storage with aligned_union?
>  

Just read up on std::aligned_union, it seems it's overkill for this case as it's equivalent to just the aligned storage. Used std::aligned_union anyway if that's more common to use than std::aligned_storage.

> we end up with the same problem we had in the first place -- having a thread_local object with a non-trivial destructor may cause the C++ runtime to call std::malloc(...) and introduce a deadlock when initialising the implementation as well as when destroying the object at thread-exit time.
> 
> Second, having uninitialised buffer space makes the initialisation of the memory trivial. We do want uninitialised bytes because we don't need to zero out the bytes, since we're going to initialize it once using placement new anyway.
> 
> I would imagine the union's probably uninitialized too?
>  

The docs on std::aligned_union suggest you're right. But now it does look a little weird that it's a union with just one type.

> 
> So unfortunately, this is the solution that works and fits within the requirements we laid out. Updated with a bit more documentation on specifically why we're using an uninitialised aligned buffer. :)
> 
> Removing the dependency on standard C++ library things that have runtime issues I think is a good goal, but is orthogonal to this particular change (since really we're just avoiding the use of `thread_local` non-trivially destructible objects).
> 
> Is that philosophy consistent with the rest of compiler-rt? Picking bits of the standard library that are known/expected not to allocate or use other 'interesting' functions?
>  

I'm not sure whether that's written down somewhere that I can read as policy (I sure haven't seen anything to that effect) or whether that's something I'd need to infer myself. For XRay, it's one of convenience -- if we can use something in the standard library for making things simpler, i.e. like using the algorithms and some utility types, then that should be fair game.

Systematically removing the dependency on containers or things that have runtime components seem to be the minimum required to make this work without severely limiting the available tools we have from the STL. This change doesn't involve much of the STL aside from the aligned storage types. The avoidance of nontrivially destructible thread_local objects has more to do with the deadlock avoidance than anything else.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170928/424bb9f3/attachment.html>


More information about the llvm-commits mailing list