[cfe-dev] __thread keyword, LLVM 3.2 & Xcode 4.6

Thu Nov 8 10:01:57 PST 2012

On Thu, Nov 8, 2012 at 1:08 AM, Jean-Daniel Dupas <devlists at shadowlab.org>wrote:

>
> Le 7 nov. 2012 à 21:16, Matthieu Monrocq <matthieu.monrocq at gmail.com> a
> écrit :
>
>
>
> On Wed, Nov 7, 2012 at 12:47 AM, Seth Cantrell <seth.cantrell at gmail.com>wrote:
>
>> On Nov 6, 2012, at 4:04 AM, Jean-Daniel Dupas <devlists at shadowlab.org>
>> wrote:
>>
>> >
>> > Le 6 nov. 2012 à 01:37, James Gregurich <bayoubengal at me.com> a écrit :
>> >
>> >> hi.
>> >>
>> >> I just updated to Xcode 4.6.  I note the following:
>> >>
>> >>
>> >> $
>> /Applications/Xcode46-DP1.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang
>> --version
>> >> Apple clang version 4.2 (tags/Apple/clang-424.0.11) (based on LLVM
>> 3.2svn)
>> >> Target: x86_64-apple-darwin12.2.0
>> >> Thread model: posix
>> >>
>> >>
>> >> It is my understanding from the release notes, that LLVM 3.2 is
>> support thread-local storage. I just re-ran my test using the '__thread'
>> keyword from the last time I asked about this and I still just get one
>> instance of the object rather than one-per-thread.
>> >
>> >
>> > The __thread keyword is a C extension (it not part of the standard).
>> Using it with C++ as is even less specified than using it with C.
>> >
>> > Moreover, it has already be specified in the previous discussion that
>> supporting C++ TLS required OS support. Updating Xcode does not change that.
>> >
>>
>> gcc 4.8 now implements thread_local with a performance penalty for global
>> thread_local variables: http://gcc.gnu.org/gcc-4.8/changes.html#cxx
>>
>> I guess that function-local thread_local variables can use the same
>> scheme for initialization as function-local static variables
>
>
>
> I would be very interested to know what this "penalty" is. I have a couple
> idea of what it *could* be, but no idea about what it really is.
>
>
> Actually it look like GCC converts thread_local access into function call
> with lazy initialization of thread_local variable.
>
>
> http://stackoverflow.com/questions/13106049/c11-gcc-4-8-thread-local-performance-penalty(especially the third answer that was post after your last comment on this
> same page)
>
> There is 2 things I would like to know though; How does it handle
> destruction at the end of the thread, and why it can't avoid the access
> penalty for POD and base types. The compiler should be smart enough to
> detect what type require complex access, and what type support direct
> access.
>
> -- Matthieu
>
>
> -- Jean-Daniel
>

Thanks!

I had not seen Kenny's answer, glad someone finally had more than an
educated guess to present.

Regarding the penalty for simple POD construction, unfortunately it might
not be trivial. For a `static thread_local` it's quite obvious whether the
value can be computed right off the bat or not, however for an `extern
thread_local` the initializer is invisible, so the optimization is not
possible.

Regarding destructors, I don't see how it could be supported.

All in all I would have preferred that they went with a similar scheme to
C++ globals by having a function for initialization and another for
destruction called upon entry and destruction. Furthermore it's unclear to
me what interactions this have with the `std::async` deferred policy. With
the implementation being able to use a thread pool under the hood, this
would require recycling the thread_local variables... and I doubt it's
covered.

-- Matthieu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20121108/465db99d/attachment.html>