[cfe-dev] Adding sanity to the Atomics implementation

Thu Jan 28 02:43:38 PST 2016

On 27 Jan 2016, at 19:55, James Knight via cfe-dev <cfe-dev at lists.llvm.org> wrote:
> 
> Right now, the atomics implementation in clang is a bit of a mess. It has three basically completely distinct code-paths:

I’m partly to blame for some of this.  Sorry!

> Alternative A: Move all atomic libcall emission into LLVM
> 
> In this alternative, LLVM will be responsible for lowering all atomic operations (to inline code or libcalls as appropriate) for all three code-paths listed at the beginning. Clang will emit no libcalls for atomic operations itself.

We had a discussion about this around 2011/2012.  The general consensus was that everyone thought that it was a good idea, no one had time to implement it.  If you have time, then that is wonderful news!

Note that part of this will involve generalising the atomic operations (at least cmpxchg) to work on *any* LLVM IR type, including vectors and pointers.

> A1) In LLVM: when lowering "store atomic", "load atomic", "atomicrmw", and "cmpxchg" instructions that aren't supported by the target, emit libcalls to the new __atomic_* functions, (rather than the current behavior of calling the legacy __sync_* functions.)
> 
> Additionally, I'd like to move the decision to emit a libcall into AtomicExpandPass, and remove the ability to use Expand in ISel to create a libcall for ISD::ATOMIC_*. Putting the decision there allows querying the target, up front, for its supported sizes, before any other lowering decisions (either other parts of AtomicExpandPass and in ISel). When choosing whether to inline or libcall, only the size and alignment of the operation should be considered, and not the operation itself, or the type. This will ensure that the "all or none" property is adhered to.

I’m not 100% sure about this.  On some architectures, it may be advantageous to lower some operations to libcalls purely because of the size of the implementation, or because of ABI stability (e.g. all architecture versions can do a store safely, but on some the other operations need locks / restart blocks on some versions and can do it in native instructions on others, and you want the ability to select between them at load time so that the same binary can work on all versions of the architecture).  

> (Q: what about the implementation of __atomic_is_lock_free/__atomic_always_lock_free in clang? The clang frontend can't query target information from the llvm backend, can it? Is there some other way to expose the information of what sizes are supported by a backend so that the clang builtins -- the latter of which must be a C++ constant expression -- can also use it? Or...just continue duplicating the info in both places, as is done today…)

I think that this will probably need to end up duplicating the info, but it’s a bit horrible.  Ideally, I’d propose extending the DataLayout to include it, which at least means that you will get a hard failure if clang and the back end disagree.

> A2) In clang, start removing the code that knows how to do optimized library call lowering for atomics -- always emit llvm atomic ops. Except for one case: clang will still need to emit library calls itself for data not aligned naturally for its size. The LLVM atomic instructions currently will not handle unaligned data, but unaligned data is allowed for the four "slab of memory" builtins (__atomic_load, __atomic_store, __atomic_compare_exchange, and __atomic_exchange).
> 
> A3) In LLVM, add "align" attributes to cmpxchg and atomicrmw, and allow specifying "align" values for "load atomic" and "store atomic" (where the attribute currently exists but cannot be used). LLVM will then be able to lower misaligned accesses to library calls. In clang, get rid of special case for directly emitting libcalls for misaligned atomics, and lower to llvm instructions there, as well.
> 
> A4) Hopefully, now, the three codepaths in clang will be sufficiently the same (as they're all now just creating llvm atomic instructions) that they can be trivially merged, or else they are so trivial that the remaining duplication is not worthwhile to merge.

One other consideration is the lowering of _Atomic(T).  C11 is very careful to not require that sizeof(T) == sizeof(_Atomic(T)) (though there are some bugs in the spec currently), to allow for architectures where sub-word atomics are inefficient (for example, _Atomic(char) may be implemented as a 16- or 32-bit quantity, with explicit truncation).  Clang currently handles this (possibly incorrectly).

David