[llvm-dev] Adding sanity to the Atomics implementation

Sun Jan 31 00:42:37 PST 2016

----- Original Message -----
> From: "Ben via llvm-dev Craig" <llvm-dev at lists.llvm.org>
> To: llvm-dev at lists.llvm.org
> Sent: Thursday, January 28, 2016 9:41:06 AM
> Subject: Re: [llvm-dev] Adding sanity to the Atomics implementation
> 
> 
> I don't have much of substance to add, but I will say that I like the
> proposal (I too prefer Alternative A).

I also prefer alternative A.

 -Hal

> When adding C11 atomic
> support for hexagon, I found it surprising that support for the
> __sync_* was implemented completely differently than the C11
> atomics.
> 
> 
> On 1/27/2016 1:55 PM, James Knight via llvm-dev wrote:
> 
> 
> 
> Right now, the atomics implementation in clang is a bit of a mess. It
> has three basically completely distinct code-paths:
> 
>     1. There's the legacy __sync_* builtins, which clang lowers
>     directly to atomic IR instructions. Then, the llvm atomic IR
>     instructions themselves can sometimes emit libcalls to __sync_*
>     library functions (which are basically undocumented, and users
>     are often responsible for implementing themselves if they need
>     it).
>     2. There's the new __atomic_* builtins, which clang will,
>     depending on size and alignment and target, lower either to a
>     libcall to a " standardized-by-GCC " __atomic_* library function
>     (implemented by libatomic), or, to the atomic IR instructions.
>     (Also covered by the same code is the __c11_atomic_* builtins,
>     which are almost-identical clang-specific aliases of the
>     __atomic_* builtins.)
>     3. There's the C11 "_Atomic" type modifier and some OpenMP stuff,
>     which clang lowers into atomic IR or libcalls to __atomic_*
>     library functions, via almost completely separate code from #2.
>     (Note: doesn't apply to C++ std::atomic<>, which gets
>     implemented with the builtins instead). Beyond just a lot of
>     duplicated logic in the code, the _Atomic impl is kinda broken:
>     sometimes it emits llvm IR instructions directly (letting things
>     fall back to __sync_* calls), and sometimes it checks if it
>     should emit a libcall first (falling back to __atomic_* calls).
>     That's pretty bad behavior.
> 
> 
> 
> I'd like to make a proposal for cleaning this mess up.
> 
> 
> BTW, as a sidenote...
> One thing that's important to remember: at least on the C/C++ level,
> if you support lock-free atomic-ops for a given size/alignment, ALL
> the atomic ops for that size/alignment must be lock-free. This
> property is usually quite easy, because you'll have either LL/SC or
> CAS instructions, which can be used to implement any other
> operation. However, many older CPU models support atomic load,
> store, and exchange operations, but are missing any way to do an
> atomic compare-and-swap. This is true for at least ARMv5, SPARCv8,
> and Intel 80386. When your minimum CPU is set to such a cpu model,
> all atomic operations must be done via libcall -- it's not
> acceptable for atomic_store to emit an atomic "store" instruction,
> but atomic_fetch_add to require a libcall which gets implemented
> with a lock. If that were to happen, then atomic_fetch_add could not
> actually be made atomic versus a simultaneous atomic_store.
> 
> 
> 
> 
> So anyhow, there's basically two paths I think we could take for
> cleanup. I like "Alternative A" below better, but would be
> interested to hear if others have reasons to think the other would
> be preferable for some reason.
> 
> 
> Both alternatives I've suggested will have the effect that the
> __sync_* builtins in clang will now lower to __atomic_* function
> calls on platforms without inline atomics (or for unsupported
> sizes), and C11 atomics will stop being schizophrenic. Clang will
> cease to ever emit a call to a __sync_* function from any of its
> builtins or language support. This could theoretically cause an
> incompatibility on some target.
> 
> 
> However, I think it ought to be pretty safe: I can't imagine anyone
> who will use an upgraded compiler has __sync_* functions implemented
> for their platform, but doesn't have the __atomic_* library
> functions implemented. The __atomic_* builtins (which already use
> those library calls) are in widespread use, notably, both in
> libstdc++ and libc++, as well as in a lot of third-party code. IMO
> it's worth having that potential incompatibility, in order to
> simplify the situation for users (only one set of atomic library
> calls to worry about), and to have a single code-path for atomics in
> the compiler.
> 
> 
> 
> 
> 
> 
> Alternative A : Move all atomic libcall emission into LLVM
> 
> 
> In this alternative, LLVM will be responsible for lowering all atomic
> operations (to inline code or libcalls as appropriate) for all three
> code-paths listed at the beginning. Clang will emit no libcalls for
> atomic operations itself.
> 
> 
> 
> 
> A1) In LLVM: when lowering "store atomic", "load atomic",
> "atomicrmw", and "cmpxchg" instructions that aren't supported by the
> target, emit libcalls to the new __atomic_* functions, (rather than
> the current behavior of calling the legacy __sync_* functions.)
> 
> 
> Additionally, I'd like to move the decision to emit a libcall into
> AtomicExpandPass, and remove the ability to use Expand in ISel to
> create a libcall for ISD::ATOMIC_*. Putting the decision there
> allows querying the target, up front, for its supported sizes,
> before any other lowering decisions (either other parts of
> AtomicExpandPass and in ISel). When choosing whether to inline or
> libcall, only the size and alignment of the operation should be
> considered, and not the operation itself, or the type. This will
> ensure that the "all or none" property is adhered to.
> 
> 
> (Q: what about the implementation of
> __atomic_is_lock_free/__atomic_always_lock_free in clang? The clang
> frontend can't query target information from the llvm backend, can
> it? Is there some other way to expose the information of what sizes
> are supported by a backend so that the clang builtins -- the latter
> of which must be a C++ constant expression -- can also use it?
> Or...just continue duplicating the info in both places, as is done
> today...)
> 
> 
> A2) In clang, start removing the code that knows how to do optimized
> library call lowering for atomics -- always emit llvm atomic ops.
> Except for one case: clang will still need to emit library calls
> itself for data not aligned naturally for its size. The LLVM atomic
> instructions currently will not handle unaligned data, but unaligned
> data is allowed for the four "slab of memory" builtins
> (__atomic_load, __atomic_store, __atomic_compare_exchange, and
> __atomic_exchange).
> 
> 
> A3) In LLVM, add "align" attributes to cmpxchg and atomicrmw, and
> allow specifying "align" values for "load atomic" and "store atomic"
> (where the attribute currently exists but cannot be used). LLVM will
> then be able to lower misaligned accesses to library calls. In
> clang, get rid of special case for directly emitting libcalls for
> misaligned atomics, and lower to llvm instructions there, as well.
> 
> 
> A4) Hopefully, now, the three codepaths in clang will be sufficiently
> the same (as they're all now just creating llvm atomic instructions)
> that they can be trivially merged, or else they are so trivial that
> the remaining duplication is not worthwhile to merge.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Alternative B : Move all libcall emission for atomics into Clang.
> 
> 
> In this alternative, LLVM will never emit atomic libcalls from the
> atomic IR instructions. If the operation requested is not possible
> on a given target, that is an error. So, the cleanups here are to
> get clang to stop emitting LLVM IR atomics that cannot be lowered
> without libcalls.
> 
> 
> B1) In Clang: have the legacy __sync_* builtins become essentially
> aliases for the __atomic_* builtins (thus they will emit calls to
> __atomic_* library functions when applicable).
> 
> 
> B2) In Clang: Have the C11 _Atomic type support and openmp stuff call
> into the __atomic builtins' code to do their dirty work, instead of
> having its own separate implementation.
> 
> 
> B3) After those two changes, I believe clang will only ever emit
> atomic IR instructions when they can be lowered. So then, in LLVM:
> get rid of the fallback to libcalls to __sync_* from the atomic IR
> instructions. If an atomic IR operation is requested and not
> implementable lock-free, it will be an error, with no fallback.
> 
> 
> 
> _______________________________________________
> LLVM Developers mailing list llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> --
> Employee of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a
> Linux Foundation Collaborative Project
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory