[LLVMdev] ASM output with JIT / codegen barriers

Sun Jan 3 22:10:38 PST 2010

In working on an LLVM backend for SBCL (a lisp compiler), there are  
certain sequences of code that must be atomic with regards to async  
signals. So, for example, on x86, a single SUB on a memory location  
should be used, not a load/sub/store sequence. LLVM's IR doesn't  
currently have any way to express this kind of constraint (...and  
really, that's essentially impossible since different architectures  
have different possibilities, so I'm not asking for this...).

All I really would like is to be able to specify the exact instruction  
sequence to emit there. I'd hoped that inline asm would be the way to  
do so, but LLVM doesn't appear to support asm output when using the  
JIT compiler. Is there any hope for inline asm being supported with  
the JIT anytime soon? Or is there an alternative suggested way of  
doing this? I'm using llvm.atomic.load.sub.i64.p0i64 for the moment,  
but that's both more expensive than I need as it has an unnecessary  
LOCK prefix, and is also theoretically incorrect. While it generates  
correct code currently on x86-64, LLVM doesn't actually *guarantee*  
that it generates a single instruction, that's just "luck".

Additionally, I think there will be some situations where a particular  
ordering of memory operations is required. LLVM makes no guarantees  
about the order of stores, unless there's some way that you could tell  
the difference in a linear program. Unfortunately, I don't have a  
linear program, I have a program which can run signal handlers between  
arbitrary instructions. So, I think I'll need something like an  
llvm.memory.barrier of type "ss", except only affecting the codegen,  
not actually inserting a processor memory barrier.

Is there already some way to insert a codegen-barrier with no  
additional runtime cost (beyond the opportunity-cost of not being able  
to reorder/delete stores across the barrier)? If not, can such a thing  
be added? On x86, this is a non-issue, since the processor already  
implicitly has inter-processor store-store barriers, so using:
   call void @llvm.memory.barrier(i1 0, i1 0, i1 0, i1 1, i1 0)
is fine: it's a noop at runtime but ensures the correct sequence of  
stores...but I'm thinking ahead here to other architectures where that  
would actually require expensive instructions to be emitted.

Thanks,
James