[PATCH] D38073: [XRay][compiler-rt] Use pthread for initializing thread-local data

David Blaikie via llvm-commits llvm-commits at lists.llvm.org
Thu Sep 28 11:11:21 PDT 2017


On Wed, Sep 27, 2017 at 5:36 PM Dean Michael Berris <dberris at google.com>
wrote:

> On 28 Sep 2017, at 03:47, David Blaikie <dblaikie at gmail.com> wrote:
>
>
>
> On Wed, Sep 20, 2017 at 6:15 PM Dean Michael Berris via Phabricator <
> reviews at reviews.llvm.org> wrote:
>
>> dberris added a comment.
>>
>> Thanks for the review!
>>
>>
>>
>> ================
>> Comment at: lib/xray/xray_fdr_logging_impl.h:110
>>  // call so that it can be initialized on first use instead of as a
>> global.
>> -struct ThreadLocalData {
>> +struct alignas(64) ThreadLocalData {
>>    BufferQueue::Buffer Buffer;
>> ----------------
>> dblaikie wrote:
>> > Presumably this should use LLVM_ALIGNAS?
>> >
>> > Is this related to the rest of the patch, if so, how? Is it for
>> correctness or performance?
>> It's related in that we'd like to use aligned uninitialised storage to
>> store one of these objects. We'd the buffer to not only be correctly sized
>> but also aligned correctly,
>
> since we're going to interpret the buffer's address as a different type
>> (punning? aliasing?
>
>
> Not sure I follow, the buffer:
>
>  thread_local std::aligned_storage<sizeof(ThreadLocalData),
>                                     alignof(ThreadLocalData)>::type
> TLSBuffer;
>
> is requested to be of the appropriate alignment for ThreadLocalData? So
> why would ThreadLocalData need to request any specific alignment itself?
>
>
> I thought you'd need the specific alignment so that pointers of the type
> would be properly aligned
>

Why do you need a specific pointer alignment? Are you using the low bits
for something like PointerIntPair?


> -- thus helping the compiler decide more appropriate aligned versions of
> the operations on members. If that's not the case, then that's fine.
>

Not sure I understand. If the members have certain minimum alignment
requirements naturally, the compiler will ensure they're aligned
sufficiently and depend on that for operations on them, etc.


>
> Or do you mean the contents of the Buffer member will be used to store
> other things? Then perhaps it'd be more appropriate to put the alignment
> attribute on the member? (& an assert somewhere that compares the alignment
> of the thing being stored with the alignment of the buffer itself).
>
>
>> I can never get the right terminology for that). The performance benefit
>> is a side-effect of the correctness issue (since we have non-trivially
>> destructible things in it).
>>
>> BTW, I can't use LLVM_ALIGNAS because that seems to be undefined in
>> compiler-rt. :/
>>
>
> Is using the attribute directly sufficiently portable for compiler-rt? Are
> there other uses of alignment in compiler-rt to compare/check against?
>
>
>
> It's C++11, and all of compiler-rt seems to be written with the
> requirement of C++11 so it's suitably portable that way. Removed it anyway.
>
>
>>
>> ================
>> Comment at: lib/xray/xray_fdr_logging_impl.h:145
>> +// the FDR logging implementation. The implementation details require a
>> bit of
>> +// care to maintain, so document the semantics that we want to achieve
>> here.
>> +//
>> ----------------
>> dblaikie wrote:
>> > "so document the semantics that we want to achieve here" seems weird -
>> that sounds like a patch note or feature request (& somewhat redundant, it
>> feels like - the reader can probably see that the semantics are documented
>> below), rather than a statement of fact.
>> >
>> > "the semantics are documented below" or maybe "The following semantics
>> are intended:" or skip this comment entirely? Not sure.
>> I like skipping it. Stream of consciousness comments at the end of the
>> day don't come out very well. :)
>>
>>
>> ================
>> Comment at: lib/xray/xray_fdr_logging_impl.h:175-183
>> +// We're doing this to avoid using a `thread_local` object that has a
>> +// non-trivial destructor, because the C++ runtime might call
>> std::malloc(...)
>> +// to register calls to destructors. Deadlocks may arise when, for
>> example, an
>> +// externally provided malloc implementation is XRay instrumented, and
>> +// initializing the thread-locals involves calling into malloc. A malloc
>> +// implementation that does global synchronization might be holding a
>> lock for a
>> +// critical section, calling a function that might be XRay instrumented
>> (and
>> ----------------
>> dblaikie wrote:
>> > Maybe this goes into more detail than needed if the XRay runtime in
>> general shouldn't be using the standard library for all these reasons -
>> that's a general requirement that wouldn't need to be rejustified at every
>> use case/implementation detail?
>> This is specifically for the function, since the original implementation
>> (and the potentially more obvious solution) uses a thread-exit guard type
>> (i.e. RAII but using a thread-local). Other parts of the implementation
>> have avoided using memory allocation routines because of performance
>> reasons but not entirely for correctness.
>>
>>
>> ================
>> Comment at: lib/xray/xray_fdr_logging_impl.h:190-191
>> +  thread_local pthread_once_t key_once = PTHREAD_ONCE_INIT;
>> +  thread_local std::aligned_storage<sizeof(ThreadLocalData),
>> +                                    alignof(ThreadLocalData)>::type
>> TLSBuffer;
>> +  // Ensure that we only actually ever do the pthread initialization
>> once.
>> ----------------
>> dblaikie wrote:
>> > Seems the goal in general was/is to move away from C++ standard library
>> usage in the runtime library, right? Is there an attempt to pick and choose
>> which parts can be relied upon, or is the overall goal to actually remove
>> C++ standard library usage entirely, like other parts of compiler-rt?
>> >
>> > If not, then maybe std::aligned_union<1, ThreadLocalData> might be
>> marginally simpler here, but only marginally. (& I wouldn't say it's
>> strictly better - since a union of only one thing is a bit strange)
>> I think I should document why we're using a thread-local uninitialized
>> memory at all in this situation. :)
>>
>> First, the type ThreadLocalData isn't trivially destructable. If this was
>> a thread_local object,
>
>
> I wasn't intending to suggest using a thread_local object, but replacing
> the aligned_storage with aligned_union?
>
>
>
> Just read up on std::aligned_union, it seems it's overkill for this case
> as it's equivalent to just the aligned storage. Used std::aligned_union
> anyway if that's more common to use than std::aligned_storage.
>
> we end up with the same problem we had in the first place -- having a
>> thread_local object with a non-trivial destructor may cause the C++ runtime
>> to call std::malloc(...) and introduce a deadlock when initialising the
>> implementation as well as when destroying the object at thread-exit time.
>>
>> Second, having uninitialised buffer space makes the initialisation of the
>> memory trivial. We do want uninitialised bytes because we don't need to
>> zero out the bytes, since we're going to initialize it once using placement
>> new anyway.
>>
>
> I would imagine the union's probably uninitialized too?
>
>
>
> The docs on std::aligned_union suggest you're right. But now it does look
> a little weird that it's a union with just one type.
>

*nod* whichever seems better - just a thought.


>
>
>> So unfortunately, this is the solution that works and fits within the
>> requirements we laid out. Updated with a bit more documentation on
>> specifically why we're using an uninitialised aligned buffer. :)
>>
>> Removing the dependency on standard C++ library things that have runtime
>> issues I think is a good goal, but is orthogonal to this particular change
>> (since really we're just avoiding the use of `thread_local` non-trivially
>> destructible objects).
>>
>
> Is that philosophy consistent with the rest of compiler-rt? Picking bits
> of the standard library that are known/expected not to allocate or use
> other 'interesting' functions?
>
>
>
> I'm not sure whether that's written down somewhere that I can read as
> policy (I sure haven't seen anything to that effect) or whether that's
> something I'd need to infer myself.
>

Infer to some degree, but probably also talk to the rest of the folks
(sanitizers) in compiler-rt (I'd suggest you start with Kostya) to have a
good idea of the goals, gotchas, etc, and maybe write it down somewhere.


> For XRay, it's one of convenience -- if we can use something in the
> standard library for making things simpler, i.e. like using the algorithms
> and some utility types, then that should be fair game.
>
> Systematically removing the dependency on containers or things that have
> runtime components seem to be the minimum required to make this work
> without severely limiting the available tools we have from the STL. This
> change doesn't involve much of the STL aside from the aligned storage
> types. The avoidance of nontrivially destructible thread_local objects has
> more to do with the deadlock avoidance than anything else.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170928/5e94f0e5/attachment.html>


More information about the llvm-commits mailing list