[LLVMdev] Advice on implementing fast per-thread data

Tue Feb 5 16:24:26 PST 2008

On Tue, 5 Feb 2008, Chris Lattner wrote:

> On Mon, 4 Feb 2008, Brian Hurt wrote:
>> Another possibility, and I'm not sure how to do this in LLVM, would be to
>> sacrifice a register to hold the pointer to the unique per-thread
>> structure.  This would be worthwhile to me even on the register-starved
>> x86-32.  I suppose I could also just add a "hidden" (compiler-added and
>> -maintained) argument to every function which is the pointer to the
>> per-thread data.
>
> Thread local storage (TLS) on Linux is better than this.  Instead of
> sacrificing a GPR, it uses a segment register to reach the TLS area,
> making it very very cheap.
>
>> Using the normal thread-local storage scares me, because I don't know the
>> performance implications.
>
> You should read up about it then. :)
> Start here: http://people.redhat.com/drepper/tls.pdf
>

Thank you.  You've just made my life about 3000% easier.  Somehow I've 
missed __thread- I was thinking of the clunky POSIX threads 
implementation.

Playing around a little bit with this, I find that:
static __thread int i;

int foo(void) {
 	i += 1;
 	return i;
}

compiles to:
foo:
         pushl   %ebp
         movl    %esp, %ebp
         movl    %gs:i at NTPOFF, %eax
         addl    $1, %eax
         movl    %eax, %gs:i at NTPOFF
         popl    %ebp
         ret

So, other than the segment override, this is no different than accessing a 
global variable.  Which means I don't have to give up a clock cycle on 
allocation speed for the common case (actually doing a collection is a 
little bit trickier).

Brian