[LLVMdev] "fork" and "sync" for LLVM thread support - any comments?
Duraid Madina
duraid at octopus.com.au
Mon Oct 30 01:56:23 PST 2006
Dear all,
Recently I've wanted to add support for threads to LLVM (motivated by
OpenMP, more or less), but before jumping in and implementing anything,
I thought it might be a good idea to describe what I have in mind and
ask for comments. Hence this email - if anyone has any comments, I'd be
very glad to hear them.
WHAT I'M PROPOSING:
The addition of two instructions - fork and sync - to the basic LLVM
language.
1) fork
syntax: fork void <fn ptr>(<arg list>)
fork is basically call, and it does what you think: an independent flow
of execution begins at a given function (with no speed/synchronicity
guarantees whatsoever). The main difference is that you can *only* fork
functions that return void. The forking thread continues without
waiting, while the forked thread either disappears when it reaches a ret
void, or continues executing forever.
example: fork void %doom(int %phd, int 666)
2) sync
syntax: sync [how] <type> (<token list>)
sync is the synchronization primitive: when a thread reaches a sync
instruction, it does not proceed until all tokens in its token list have
been sync'd by other threads. e.g. If thread A issues:
sync int (1, 2)
then it will block until thread B issues, say:
sync int (2, 1)
or perhaps:
sync int (1)
sync int (2)
at which point both thread A and thread B will resume, or, B could issue:
sync int (1, 2, 3)
at which point A would resume, but B would block pending a sync(3) from
somewhere.
The optional [how] parameter is there for performance reasons: it is
simply a way to request that LLVM generate code to implement the sync in
a particular way, e.g.
AtomicOpBusyWait
AtomicOpYielding
Signal
etc..
That's it - fork and sync are (I hope) all that's required.
A PATRONIZING EXAMPLE:
Summing the numbers in an array, using two threads:
%sumB = global int 0
int %sumfun(%base, %offset, %count)
[simple loop adding numbers from base[offset] to base[offset+count]]
void %sumwrap(int *base, int %offset, int %count)
%mysum = call int %sumfun(%base, %offset, %count)
store int %mysum, int %sumB
sync ubyte(42)
ret void
int %main(void)
%array = malloc [1000 x uint]
%sumA = call int %sumfun(%array, 0, 500)
fork void %sumwrap(%array, 500, 500)
sync ubyte (42)
%sum = add int %sumA, %sumB
ret int %sum
.. this example is simple enough, but there are many other
possibilities of course. e.g. more OpenMP-like would be to use *three*
threads, and have main() fork two workers, handing them sync tokens:
void %sumwrap(int *base, int %offset, int %count, int* %result, ubyte
%token)
%mysum = call int %sumfun(%base, %offset, %count)
store int %mysum, int* %result
sync ubyte(%token)
ret void
int %main(void)
%array = malloc [1000 x uint]
fork void %sumwrap(%array, 0, 500, %sumA, 42)
fork void %sumwrap(%array, 500, 500, %sumB, 43)
sync ubyte (42, 43)
%sum = add int %sumA, %sumB
ret int %sum
..etc etc.
ASSORTED COMMENTS OF MY OWN:
What I see as positive aspects of this proposal:
- It is relatively simple.
- It does not involve making any effort to hand-hold the developer: this
is *LL*VM after all; e.g. in the example above, %sumfun() needs to be
re-entrant: if the loop counter was a single global variable, obviously
things would break down. That's fine, but OpenMP allows variables to be
explicitly marked shared or thread-private. Supporting that, IMHO, is an
issue for any OpenMP->LLVM compiler to take care of (e.g. building
thread-specialized copies of functions/variables as necessary), rather
than something LLVM should be talking about explicitly: changing LLVM so
that variables *at the LLVM level* could be shared or private seems too
high-level to me (not to mention being a very invasive change.)
- It is flexible in terms of the actual synchronization schedules that
can be implemented. While semantically there is just a "big bag of
shared, atomic booleans" (sync tokens), one can use any particular token
type they like. There's no need to worry about a proliferation of
tokens: massive sync() statements with millions of tokens could be
replaced by log(n) many at the cost of some intermediate "sync only"
functions. I expect that in typical use, sync statements will never get
so large or so frequent that the supply of sync tokens would be a
concern. Moreover, there's no need that sync tokens be compile-time
constants: just knowing their type is enough.
- It is flexible in terms of the actual synchronization code that can
get built. Again, the semantics are just simple barriers, but there are
many different ways of implementing these, some of which are
architecture- and/or OS-specific. sync's [how] parameter is intended to
be a way to "get the right code" in any situation where that is
important for performance (or whatever other) reasons. For example, on ia64:
sync AtomicOpBusyWait byte (1,2,3,4,5,....14,15,16)
could be codegenned to a tight busyloop using ld.acq, spinning to see
if two 8-byte words become exactly 0, while matching
sync AtomicOpBusyWait byte (x)
instructions in the worker threads code be codegenned to tight
ld.acq/cmpxchg.rel busyloops writing 0 bytes into the appropriate places.
(Roughly, the idea is that LLVM backends supporting atomic ops would
offer at least the following sync "styles":
AtomicOpBusyWait - for minimum latency, busy wait using atomic ops
to get/put the sync token
AtomicOpYielding - for increased politeness, codegen to a busyloop
with something like nanosleep(50000) inside.)
- If forks are replaced with calls, and syncs are replaced with NOPs,
then threaded code without any race conditions should continue to work
correctly in a single-threaded environment.
- A simple C library wrapping various OS/arch-specific threading
primitives is all that's required to implement the basic support
library. LLVM backends can simply lower fork and sync to calls to
functions in this library for some basic level of support, though
eventually the backends would probably want to emit synchronization code
directly, for performance.
Other random thoughts:
- I don't really have a strong opinion on what to do about sync's that
are unpaired. We could either do some work and complain at compile-time,
or simply have such syncs hang. The latter makes more sense to me.
- An important point is that when generating code, how do you know who
is the "forker" and who is the "forkee" if all you have are sync
instructions that look alike? There are a few answers:
a) If all sync tokens are compile-time constants, then it actually
shouldn't matter (expect maybe for performance.) All tokens (and by
extension, their respective sync instructions) will be able to be paired
at compile-time, and questions of "which way around" simply don't matter
- you'll get the right semantics (since you'll be able to pair things up
correctly), but possibly poor performance.
b) If not, the (global) CFG can be traversed to see who is forking
and who is being forked. The only tricky case is where this involves a
proper graph and not a tree (i.e. two bits of code mutually forking each
other): at that point you can either again make arbitrary forker/forkee
decisions, or use the [how] parameter to make the distinction clear
where it's important.
- One possible gotcha with sync's optional [how] parameter is that
(apart from having to implement all the different syncs, (possibly)
across different platforms) one needs to bear in mind that a sync
of one [how] type needs to be mated with a sync of a compatible [how] type.
- It may be useful to add a [how] option to the fork instruction: most
commonly used operating systems have more than one way of forking a
thread, and sometimes the choice of which to use can be important for
performance. (You know who you are. ;) It may be good to expose this to
LLVM.
OK, that's enough of a ramble for one email. Once again, if anyone
has any comments, I'd be most grateful.
Thanks in advance,
Duraid
P.S. I've been told that Misha Brukman implemented something similar to
this at one point, but have been too busy^Wlazy to chase him up about it
yet. Misha, if you read this, please yell at me! :)
More information about the llvm-dev
mailing list