[LLVMdev] Atomic Operation and Synchronization Proposal v2

Thu Jul 12 05:23:32 PDT 2007

Here are some comments, quotes are from the draft.

> an operation based constraint cannot guard other operations

I think constraints associated with a particular instruction usually apply 
to this instruction and previous/subsequent instructions, so this wouldn't 
be true. This is the case in the atomic_ops model, and also on ia64 I 
think.

> The single instruction constraints can, at their most flexible, constrain 
> any set of possible pairings of loads from memory and stores to memory

I'm not sure about this, but can we get issues due to "special" kinds of 
data transfers (such as vector stuff, DMA, ...?). Memcpy implementations 
could be a one thing to look at.
This kind of breaks down to how universal you want the memory model to be.

Regarding swap(): Which uses do you have in mind? To me, support for TAS 
would be more useful as it is used for spinlocks (and no concurrent algo 
immediately comes to my mind that uses swap() and couldn't use TAS). The 
difference is that TAS can have a weaker interface, because it operates on 
a boolean value only, making it easier to map on different hardware (and 
TAS is faster than CAS on some architectures I think). For example, for TAS 
a byte is sufficient, whereas with swap you probably would require that the 
exchange has machine-word size.

> These implementations assume a very conservative interpretation. 
> result = __sync_fetch_and_add( <ty>* ptr, <ty> value )
>             call void @llvm.atomic.membarrier( i1 true, i1 true, i1 true,
>               i1 true ) 
> %result   = call <ty> @llvm.atomic.las( <ty>* %ptr, <ty> %value )

Shouldn't you have a second membar after the las() to be very conservative 
(i.e., if las() is supposed to really be linearizable)? Otherwise, the 
effects of the las() can be reordered with respect to effects of subsequent 
instructions.

This also shows that you get additional overhead in the codegen if barriers 
are not associated with an operation: To emit efficient code, codegen would 
have to check whether membars are next to an operation, and whether they 
can be merged with the operation. If codegens don't do it, then runtime 
overhead will be higher due to the unnecessary barriers. This also implies 
that there is no reordering of las() with other operations until codegen 
phase, so you would at least have to have some sort of compiler barrier, or 
require the las() target to be volatile (does LLVM volatile ordering 
guarantees apply to non-volatile loads/stores also, or just to volatile 
ones?)

If you use the other approach instead (ordering constraint attached to 
instructions), then you have to support more intrinsics, but you don't need 
this kind of analysis, and you wouldn't need compiler reordering barriers 
assuming that they are implicit for anything that carries any reordering 
constraint.

I would guess that with the second approach (constraints for operations), 
codegen implementations could be actually easier because you can 
concentrate on just one operation with constraints.

> result = __sync_fetch_and_or( <ty>* ptr, <ty> value )

or/and/... should be added to the list of supported intrinsics at some time. 
x86 has built-in support for that and doesn't need the CAS loop.

Torvald