[cfe-dev] __thread keyword, LLVM 3.2 & Xcode 4.6
matthieu.monrocq at gmail.com
Thu Nov 8 10:01:57 PST 2012
On Thu, Nov 8, 2012 at 1:08 AM, Jean-Daniel Dupas <devlists at shadowlab.org>wrote:
> Le 7 nov. 2012 à 21:16, Matthieu Monrocq <matthieu.monrocq at gmail.com> a
> écrit :
> On Wed, Nov 7, 2012 at 12:47 AM, Seth Cantrell <seth.cantrell at gmail.com>wrote:
>> On Nov 6, 2012, at 4:04 AM, Jean-Daniel Dupas <devlists at shadowlab.org>
>> > Le 6 nov. 2012 à 01:37, James Gregurich <bayoubengal at me.com> a écrit :
>> >> hi.
>> >> I just updated to Xcode 4.6. I note the following:
>> >> $
>> >> Apple clang version 4.2 (tags/Apple/clang-424.0.11) (based on LLVM
>> >> Target: x86_64-apple-darwin12.2.0
>> >> Thread model: posix
>> >> It is my understanding from the release notes, that LLVM 3.2 is
>> support thread-local storage. I just re-ran my test using the '__thread'
>> keyword from the last time I asked about this and I still just get one
>> instance of the object rather than one-per-thread.
>> > The __thread keyword is a C extension (it not part of the standard).
>> Using it with C++ as is even less specified than using it with C.
>> > Moreover, it has already be specified in the previous discussion that
>> supporting C++ TLS required OS support. Updating Xcode does not change that.
>> gcc 4.8 now implements thread_local with a performance penalty for global
>> thread_local variables: http://gcc.gnu.org/gcc-4.8/changes.html#cxx
>> I guess that function-local thread_local variables can use the same
>> scheme for initialization as function-local static variables
> I would be very interested to know what this "penalty" is. I have a couple
> idea of what it *could* be, but no idea about what it really is.
> Actually it look like GCC converts thread_local access into function call
> with lazy initialization of thread_local variable.
> http://stackoverflow.com/questions/13106049/c11-gcc-4-8-thread-local-performance-penalty(especially the third answer that was post after your last comment on this
> same page)
> There is 2 things I would like to know though; How does it handle
> destruction at the end of the thread, and why it can't avoid the access
> penalty for POD and base types. The compiler should be smart enough to
> detect what type require complex access, and what type support direct
> -- Matthieu
> -- Jean-Daniel
I had not seen Kenny's answer, glad someone finally had more than an
educated guess to present.
Regarding the penalty for simple POD construction, unfortunately it might
not be trivial. For a `static thread_local` it's quite obvious whether the
value can be computed right off the bat or not, however for an `extern
thread_local` the initializer is invisible, so the optimization is not
Regarding destructors, I don't see how it could be supported.
All in all I would have preferred that they went with a similar scheme to
C++ globals by having a function for initialization and another for
destruction called upon entry and destruction. Furthermore it's unclear to
me what interactions this have with the `std::async` deferred policy. With
the implementation being able to use a thread pool under the hood, this
would require recycling the thread_local variables... and I doubt it's
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the cfe-dev