[LLVMdev] ASM output with JIT / codegen barriers
James Y Knight
foom at fuhm.net
Sun Jan 3 22:10:38 PST 2010
In working on an LLVM backend for SBCL (a lisp compiler), there are
certain sequences of code that must be atomic with regards to async
signals. So, for example, on x86, a single SUB on a memory location
should be used, not a load/sub/store sequence. LLVM's IR doesn't
currently have any way to express this kind of constraint (...and
really, that's essentially impossible since different architectures
have different possibilities, so I'm not asking for this...).
All I really would like is to be able to specify the exact instruction
sequence to emit there. I'd hoped that inline asm would be the way to
do so, but LLVM doesn't appear to support asm output when using the
JIT compiler. Is there any hope for inline asm being supported with
the JIT anytime soon? Or is there an alternative suggested way of
doing this? I'm using llvm.atomic.load.sub.i64.p0i64 for the moment,
but that's both more expensive than I need as it has an unnecessary
LOCK prefix, and is also theoretically incorrect. While it generates
correct code currently on x86-64, LLVM doesn't actually *guarantee*
that it generates a single instruction, that's just "luck".
Additionally, I think there will be some situations where a particular
ordering of memory operations is required. LLVM makes no guarantees
about the order of stores, unless there's some way that you could tell
the difference in a linear program. Unfortunately, I don't have a
linear program, I have a program which can run signal handlers between
arbitrary instructions. So, I think I'll need something like an
llvm.memory.barrier of type "ss", except only affecting the codegen,
not actually inserting a processor memory barrier.
Is there already some way to insert a codegen-barrier with no
additional runtime cost (beyond the opportunity-cost of not being able
to reorder/delete stores across the barrier)? If not, can such a thing
be added? On x86, this is a non-issue, since the processor already
implicitly has inter-processor store-store barriers, so using:
call void @llvm.memory.barrier(i1 0, i1 0, i1 0, i1 1, i1 0)
is fine: it's a noop at runtime but ensures the correct sequence of
stores...but I'm thinking ahead here to other architectures where that
would actually require expensive instructions to be emitted.
Thanks,
James
More information about the llvm-dev
mailing list