[LLVMdev] ASM output with JIT / codegen barriers
James Y Knight
foom at fuhm.net
Mon Jan 4 13:13:40 PST 2010
On Jan 4, 2010, at 4:35 AM, Chandler Carruth wrote:
> Responding to the original email...
>
> On Sun, Jan 3, 2010 at 10:10 PM, James Y Knight <foom at fuhm.net> wrote:
>> In working on an LLVM backend for SBCL (a lisp compiler), there are
>> certain sequences of code that must be atomic with regards to async
>> signals.
>
> Can you define exactly what 'atomic with regards to async signals'
> this entails? Your descriptions led me to think you may mean something
> other than the POSIX definition, but maybe I'm just misinterpreting
> it. Are these signals guaranteed to run in the same thread? On the
> same processor? Is there concurrent code running in the address space
> when they run?
Hi, thanks everyone for all the comments. I think maybe I wasn't clear
that I *only* care about atomicity w.r.t. a signal handler
interruption in the same thread, *not* across threads. Therefore, many
of the problems of cross-CPU atomicity are not relevant. The signal
handler gets invoked via pthread_kill, and is thus necessarily running
in the same thread as the code being interrupted. The memory in
question can be considered thread-local here, so I'm not worried about
other threads touching it at all.
I also realize I had (at least :) one error in my original email: of
course, the atomic operations llvm provides *ARE* guaranteed to do the
right thing w.r.t. atomicity against signal handlers...they in fact
just do more than I need, not less. I'm not sure why I thought they
were both more and less than I needed before, and sorry if it confused
you about what I'm trying to accomplish.
Here's a concrete example, in hopes it will clarify matters:
@pseudo_atomic = thread_local global i64 0
declare i64* @alloc(i64)
declare void @do_pending_interrupt()
declare i64 @llvm.atomic.load.sub.i64.p0i64(i64* nocapture, i64)
nounwind
declare void @llvm.memory.barrier(i1, i1, i1, i1, i1)
define i64* @foo() {
;; Note that we're in an allocation section
store i64 1, i64* @pseudo_atomic
;; Barrier only to ensure instruction ordering, not needed as a
true memory barrier
call void @llvm.memory.barrier(i1 0, i1 0, i1 0, i1 1, i1 0)
;; Call might actually be inlined, so cannot depend upon unknown
call causing correct codegen effects.
%obj = call i64* @alloc(i64 32)
%obj_header = getelementptr i64* %obj, i64 0
store i64 5, i64* %obj_header ;; store obj type (5) in header word
%obj_len = getelementptr i64* %obj, i64 1
store i64 2, i64* %obj_len ;; store obj length (2) in length slot
...etc...
;; Check if we were interrupted:
%res = call i64 @llvm.atomic.load.sub.i64.p0i64(i64*
@pseudo_atomic, i64 1)
%was_interrupted = icmp eq i64 %res, 1
br i1 %was_interrupted, label %do-interruption, label %continue
continue:
ret i64* %obj
do-interruption:
call void @do_pending_interrupt()
br label %continue
}
A signal handler will check the thread-local @pseudo_atomic variable:
if it was already set it will just change the value to 2 and return,
waiting to be reinvoked by do_pending_interrupt at the end of the
pseudo-atomic section. This is because it may get confused by the
proto-object being built up in this code.
This sequence that SBCL does today with its internal codegen is
basically like:
MOV <pseudo_atomic>, 1
[[do allocation, fill in object, etc]]
XOR <pseudo_atomic>, 1
JEQ continue
<<call do_pending_interrupt>>
continue:
...
The important things here are:
1) Stores cannot be migrated from within the MOV/XOR instructions to
outside by the codegen.
2) There's no way an interruption can be missed: the XOR is atomic
with regards to signals executing in the same thread, it's either
fully executed or not (both load+store). But I don't care whether it's
visible on other CPUs or not: it's a thread-local variable in any case.
Those are the two properties I'd like to get from LLVM, without
actually ever invoking superfluous processor synchronization.
> The processor can reorder memory operations as well (within limits).
> Consider that 'memset' to zero is often codegened to a non-temporal
> store to memory. This exempts it from all ordering considerations
My understanding is that processor reordering only affects what you
might see from another CPU: the processor will undo speculatively
executed operations if the sequence of instructions actually executed
is not the sequence it predicted, so within a single CPU you should
never be able tell the difference.
But I must admit I don't know anything about non-temporal stores.
Within a single thread, if I do a non-temporal store, followed by a
load, am I not guaranteed to get back the value I stored?
James
More information about the llvm-dev
mailing list