[LLVMdev] ASM output with JIT / codegen barriers

Mon Jan 4 01:35:57 PST 2010

Responding to the original email...

On Sun, Jan 3, 2010 at 10:10 PM, James Y Knight <foom at fuhm.net> wrote:
> In working on an LLVM backend for SBCL (a lisp compiler), there are
> certain sequences of code that must be atomic with regards to async
> signals.

Can you define exactly what 'atomic with regards to async signals'
this entails? Your descriptions led me to think you may mean something
other than the POSIX definition, but maybe I'm just misinterpreting
it. Are these signals guaranteed to run in the same thread? On the
same processor? Is there concurrent code running in the address space
when they run?

<snip, this seems to be well handled on sibling email...>

> Additionally, I think there will be some situations where a particular
> ordering of memory operations is required. LLVM makes no guarantees
> about the order of stores, unless there's some way that you could tell
> the difference in a linear program. Unfortunately, I don't have a
> linear program, I have a program which can run signal handlers between
> arbitrary instructions. So, I think I'll need something like an
> llvm.memory.barrier of type "ss", except only affecting the codegen,
> not actually inserting a processor memory barrier.

The processor can reorder memory operations as well (within limits).
Consider that 'memset' to zero is often codegened to a non-temporal
store to memory. This exempts it from all ordering considerations
except for an explicit memory fence in the processor. If code were to
execute between those two instructions, the contents of the memory
could read "andthenumberofcountingshallbethree", or 'feedbeef', or
'0000...' or '11111...' there's just no telling.

> Is there already some way to insert a codegen-barrier with no
> additional runtime cost (beyond the opportunity-cost of not being able
> to reorder/delete stores across the barrier)? If not, can such a thing
> be added? On x86, this is a non-issue, since the processor already
> implicitly has inter-processor store-store barriers, so using:
>   call void @llvm.memory.barrier(i1 0, i1 0, i1 0, i1 1, i1 0)
> is fine: it's a noop at runtime but ensures the correct sequence of
> stores...but I'm thinking ahead here to other architectures where that
> would actually require expensive instructions to be emitted.

But... if it *did* require expensive instructions, wouldn't you want
them?!?! The reason we don't emit on x86 is because of its memory
ordering guarantees. If it didn't have them, we would emit
instructions to impose one because otherwise the wrong thing might
happen. I think you should trust LLVM to only emit expensive
instructions to achieve the ordering semantics you specify when they
are necessary for the architecture, and file bugs if it ever fails.

The only useful thing I can think of is if you happen to know that you
execute on some "uniprocessor" with at most one thread of execution;
and thus gain memory ordering constraints beyond those which can be
assumed across an entire architecture (this is certainly true for
x86). If it is useful to leverage this to optimize codegen, it should
be at the target level, with some target options to specify that
consistency assumptions should be greater than normal. The intrinsics
and semantics should remain the same regardless.