[LLVMdev] Atomic Operation and Synchronization Proposal v2
Torvald Riegel
torvald at se.inf.tu-dresden.de
Thu Jul 12 05:23:32 PDT 2007
Here are some comments, quotes are from the draft.
> an operation based constraint cannot guard other operations
I think constraints associated with a particular instruction usually apply
to this instruction and previous/subsequent instructions, so this wouldn't
be true. This is the case in the atomic_ops model, and also on ia64 I
think.
> The single instruction constraints can, at their most flexible, constrain
> any set of possible pairings of loads from memory and stores to memory
I'm not sure about this, but can we get issues due to "special" kinds of
data transfers (such as vector stuff, DMA, ...?). Memcpy implementations
could be a one thing to look at.
This kind of breaks down to how universal you want the memory model to be.
Regarding swap(): Which uses do you have in mind? To me, support for TAS
would be more useful as it is used for spinlocks (and no concurrent algo
immediately comes to my mind that uses swap() and couldn't use TAS). The
difference is that TAS can have a weaker interface, because it operates on
a boolean value only, making it easier to map on different hardware (and
TAS is faster than CAS on some architectures I think). For example, for TAS
a byte is sufficient, whereas with swap you probably would require that the
exchange has machine-word size.
> These implementations assume a very conservative interpretation.
> result = __sync_fetch_and_add( <ty>* ptr, <ty> value )
> call void @llvm.atomic.membarrier( i1 true, i1 true, i1 true,
> i1 true )
> %result = call <ty> @llvm.atomic.las( <ty>* %ptr, <ty> %value )
Shouldn't you have a second membar after the las() to be very conservative
(i.e., if las() is supposed to really be linearizable)? Otherwise, the
effects of the las() can be reordered with respect to effects of subsequent
instructions.
This also shows that you get additional overhead in the codegen if barriers
are not associated with an operation: To emit efficient code, codegen would
have to check whether membars are next to an operation, and whether they
can be merged with the operation. If codegens don't do it, then runtime
overhead will be higher due to the unnecessary barriers. This also implies
that there is no reordering of las() with other operations until codegen
phase, so you would at least have to have some sort of compiler barrier, or
require the las() target to be volatile (does LLVM volatile ordering
guarantees apply to non-volatile loads/stores also, or just to volatile
ones?)
If you use the other approach instead (ordering constraint attached to
instructions), then you have to support more intrinsics, but you don't need
this kind of analysis, and you wouldn't need compiler reordering barriers
assuming that they are implicit for anything that carries any reordering
constraint.
I would guess that with the second approach (constraints for operations),
codegen implementations could be actually easier because you can
concentrate on just one operation with constraints.
> result = __sync_fetch_and_or( <ty>* ptr, <ty> value )
or/and/... should be added to the list of supported intrinsics at some time.
x86 has built-in support for that and doesn't need the CAS loop.
Torvald
More information about the llvm-dev
mailing list