[llvm] r266002 - Add __atomic_* lowering to AtomicExpandPass.

James Y Knight via llvm-commits llvm-commits at lists.llvm.org
Tue Apr 12 08:08:19 PDT 2016


No need to apologize to me for reverting! Thanks for taking care of it.

On Tue, Apr 12, 2016 at 8:39 AM, Rafael EspĂ­ndola <
rafael.espindola at gmail.com> wrote:

> Sorry, this broke the msan bots so I reverted it:
>
>
> http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/11839/steps/check-llvm%20msan/logs/stdio
>
> Cheers,
> Rafael
>
>
> On 11 April 2016 at 18:22, James Y Knight via llvm-commits
> <llvm-commits at lists.llvm.org> wrote:
> > Author: jyknight
> > Date: Mon Apr 11 17:22:33 2016
> > New Revision: 266002
> >
> > URL: http://llvm.org/viewvc/llvm-project?rev=266002&view=rev
> > Log:
> > Add __atomic_* lowering to AtomicExpandPass.
> >
> > AtomicExpandPass can now lower atomic load, atomic store, atomicrmw, and
> > cmpxchg instructions to __atomic_* library calls, when the target
> > doesn't support atomics of a given size.
> >
> > This is the first step towards moving all atomic lowering from clang
> > into llvm. When all is done, the behavior of __sync_* builtins,
> > __atomic_* builtins, and C11 atomics will be unified.
> >
> > Previously LLVM would pass everything through to the ISelLowering
> > code. There, unsupported atomic instructions would turn into __sync_*
> > library calls. Because of that behavior, Clang currently avoids emitting
> > llvm IR atomic instructions when this would happen, and emits __atomic_*
> > library functions itself, in the frontend.
> >
> > This change makes LLVM able to emit __atomic_* libcalls, and thus will
> > eventually allow clang to depend on LLVM to do the right thing.
> >
> > It is advantageous to do the new lowering to atomic libcalls in
> > AtomicExpandPass, before ISel time, because it's important that all
> > atomic operations for a given size either lower to __atomic_*
> > libcalls (which may use locks), or native instructions which won't. No
> > mixing and matching.
> >
> > At the moment, this code is enabled only for SPARC, as a
> > demonstration. The next commit will expand support to all of the other
> > targets.
> >
> > Differential Revision: http://reviews.llvm.org/D18200
> >
> > Added:
> >     llvm/trunk/test/Transforms/AtomicExpand/SPARC/
> >     llvm/trunk/test/Transforms/AtomicExpand/SPARC/libcalls.ll
> >     llvm/trunk/test/Transforms/AtomicExpand/SPARC/lit.local.cfg
> > Modified:
> >     llvm/trunk/docs/Atomics.rst
> >     llvm/trunk/include/llvm/CodeGen/RuntimeLibcalls.h
> >     llvm/trunk/include/llvm/Target/TargetLowering.h
> >     llvm/trunk/lib/CodeGen/AtomicExpandPass.cpp
> >     llvm/trunk/lib/CodeGen/TargetLoweringBase.cpp
> >     llvm/trunk/lib/Target/Sparc/SparcISelLowering.cpp
> >
> > Modified: llvm/trunk/docs/Atomics.rst
> > URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/Atomics.rst?rev=266002&r1=266001&r2=266002&view=diff
> >
> ==============================================================================
> > --- llvm/trunk/docs/Atomics.rst (original)
> > +++ llvm/trunk/docs/Atomics.rst Mon Apr 11 17:22:33 2016
> > @@ -413,19 +413,28 @@ The MachineMemOperand for all atomic ope
> >  this is not correct in the IR sense of volatile, but CodeGen handles
> anything
> >  marked volatile very conservatively.  This should get fixed at some
> point.
> >
> > -Common architectures have some way of representing at least a
> pointer-sized
> > -lock-free ``cmpxchg``; such an operation can be used to implement all
> the other
> > -atomic operations which can be represented in IR up to that size.
> Backends are
> > -expected to implement all those operations, but not operations which
> cannot be
> > -implemented in a lock-free manner.  It is expected that backends will
> give an
> > -error when given an operation which cannot be implemented.  (The LLVM
> code
> > -generator is not very helpful here at the moment, but hopefully that
> will
> > -change.)
> > +One very important property of the atomic operations is that if your
> backend
> > +supports any inline lock-free atomic operations of a given size, you
> should
> > +support *ALL* operations of that size in a lock-free manner.
> > +
> > +When the target implements atomic ``cmpxchg`` or LL/SC instructions (as
> most do)
> > +this is trivial: all the other operations can be implemented on top of
> those
> > +primitives. However, on many older CPUs (e.g. ARMv5, SparcV8, Intel
> 80386) there
> > +are atomic load and store instructions, but no ``cmpxchg`` or LL/SC. As
> it is
> > +invalid to implement ``atomic load`` using the native instruction, but
> > +``cmpxchg`` using a library call to a function that uses a mutex,
> ``atomic
> > +load`` must *also* expand to a library call on such architectures, so
> that it
> > +can remain atomic with regards to a simultaneous ``cmpxchg``, by using
> the same
> > +mutex.
> > +
> > +AtomicExpandPass can help with that: it will expand all atomic
> operations to the
> > +proper ``__atomic_*`` libcalls for any size above the maximum set by
> > +``setMaxAtomicSizeInBitsSupported`` (which defaults to 0).
> >
> >  On x86, all atomic loads generate a ``MOV``. SequentiallyConsistent
> stores
> >  generate an ``XCHG``, other stores generate a ``MOV``.
> SequentiallyConsistent
> >  fences generate an ``MFENCE``, other fences do not cause any code to be
> > -generated.  cmpxchg uses the ``LOCK CMPXCHG`` instruction.  ``atomicrmw
> xchg``
> > +generated.  ``cmpxchg`` uses the ``LOCK CMPXCHG`` instruction.
> ``atomicrmw xchg``
> >  uses ``XCHG``, ``atomicrmw add`` and ``atomicrmw sub`` use ``XADD``,
> and all
> >  other ``atomicrmw`` operations generate a loop with ``LOCK CMPXCHG``.
> Depending
> >  on the users of the result, some ``atomicrmw`` operations can be
> translated into
> > @@ -446,10 +455,151 @@ atomic constructs. Here are some lowerin
> >    ``emitStoreConditional()``
> >  * large loads/stores -> ll-sc/cmpxchg
> >    by overriding
> ``shouldExpandAtomicStoreInIR()``/``shouldExpandAtomicLoadInIR()``
> > -* strong atomic accesses -> monotonic accesses + fences
> > -  by using ``setInsertFencesForAtomic()`` and overriding
> ``emitLeadingFence()``
> > -  and ``emitTrailingFence()``
> > +* strong atomic accesses -> monotonic accesses + fences by overriding
> > +  ``shouldInsertFencesForAtomic()``, ``emitLeadingFence()``, and
> > +  ``emitTrailingFence()``
> >  * atomic rmw -> loop with cmpxchg or load-linked/store-conditional
> >    by overriding ``expandAtomicRMWInIR()``
> > +* expansion to __atomic_* libcalls for unsupported sizes.
> >
> >  For an example of all of these, look at the ARM backend.
> > +
> > +Libcalls: __atomic_*
> > +====================
> > +
> > +There are two kinds of atomic library calls that are generated by LLVM.
> Please
> > +note that both sets of library functions somewhat confusingly share the
> names of
> > +builtin functions defined by clang. Despite this, the library functions
> are
> > +not directly related to the builtins: it is *not* the case that
> ``__atomic_*``
> > +builtins lower to ``__atomic_*`` library calls and ``__sync_*``
> builtins lower
> > +to ``__sync_*`` library calls.
> > +
> > +The first set of library functions are named ``__atomic_*``. This set
> has been
> > +"standardized" by GCC, and is described below. (See also `GCC's
> documentation
> > +<https://gcc.gnu.org/wiki/Atomic/GCCMM/LIbrary>`_)
> > +
> > +LLVM's AtomicExpandPass will translate atomic operations on data sizes
> above
> > +``MaxAtomicSizeInBitsSupported`` into calls to these functions.
> > +
> > +There are four generic functions, which can be called with data of any
> size or
> > +alignment::
> > +
> > +   void __atomic_load(size_t size, void *ptr, void *ret, int ordering)
> > +   void __atomic_store(size_t size, void *ptr, void *val, int ordering)
> > +   void __atomic_exchange(size_t size, void *ptr, void *val, void *ret,
> int ordering)
> > +   bool __atomic_compare_exchange(size_t size, void *ptr, void
> *expected, void *desired, int success_order, int failure_order)
> > +
> > +There are also size-specialized versions of the above functions, which
> can only
> > +be used with *naturally-aligned* pointers of the appropriate size. In
> the
> > +signatures below, "N" is one of 1, 2, 4, 8, and 16, and "iN" is the
> appropriate
> > +integer type of that size; if no such integer type exists, the
> specialization
> > +cannot be used::
> > +
> > +   iN __atomic_load_N(iN *ptr, iN val, int ordering)
> > +   void __atomic_store_N(iN *ptr, iN val, int ordering)
> > +   iN __atomic_exchange_N(iN *ptr, iN val, int ordering)
> > +   bool __atomic_compare_exchange_N(iN *ptr, iN *expected, iN desired,
> int success_order, int failure_order)
> > +
> > +Finally there are some read-modify-write functions, which are only
> available in
> > +the size-specific variants (any other sizes use a
> ``__atomic_compare_exchange``
> > +loop)::
> > +
> > +   iN __atomic_fetch_add_N(iN *ptr, iN val, int ordering)
> > +   iN __atomic_fetch_sub_N(iN *ptr, iN val, int ordering)
> > +   iN __atomic_fetch_and_N(iN *ptr, iN val, int ordering)
> > +   iN __atomic_fetch_or_N(iN *ptr, iN val, int ordering)
> > +   iN __atomic_fetch_xor_N(iN *ptr, iN val, int ordering)
> > +   iN __atomic_fetch_nand_N(iN *ptr, iN val, int ordering)
> > +
> > +This set of library functions have some interesting implementation
> requirements
> > +to take note of:
> > +
> > +- They support all sizes and alignments -- including those which cannot
> be
> > +  implemented natively on any existing hardware. Therefore, they will
> certainly
> > +  use mutexes in for some sizes/alignments.
> > +
> > +- As a consequence, they cannot be shipped in a statically linked
> > +  compiler-support library, as they have state which must be shared
> amongst all
> > +  DSOs loaded in the program. They must be provided in a shared library
> used by
> > +  all objects.
> > +
> > +- The set of atomic sizes supported lock-free must be a superset of the
> sizes
> > +  any compiler can emit. That is: if a new compiler introduces support
> for
> > +  inline-lock-free atomics of size N, the ``__atomic_*`` functions must
> also have a
> > +  lock-free implementation for size N. This is a requirement so that
> code
> > +  produced by an old compiler (which will have called the
> ``__atomic_*`` function)
> > +  interoperates with code produced by the new compiler (which will use
> native
> > +  the atomic instruction).
> > +
> > +Note that it's possible to write an entirely target-independent
> implementation
> > +of these library functions by using the compiler atomic builtins
> themselves to
> > +implement the operations on naturally-aligned pointers of supported
> sizes, and a
> > +generic mutex implementation otherwise.
> > +
> > +Libcalls: __sync_*
> > +==================
> > +
> > +Some targets or OS/target combinations can support lock-free atomics,
> but for
> > +various reasons, it is not practical to emit the instructions inline.
> > +
> > +There's two typical examples of this.
> > +
> > +Some CPUs support multiple instruction sets which can be swiched back
> and forth
> > +on function-call boundaries. For example, MIPS supports the MIPS16 ISA,
> which
> > +has a smaller instruction encoding than the usual MIPS32 ISA. ARM,
> similarly,
> > +has the Thumb ISA. In MIPS16 and earlier versions of Thumb, the atomic
> > +instructions are not encodable. However, those instructions are
> available via a
> > +function call to a function with the longer encoding.
> > +
> > +Additionally, a few OS/target pairs provide kernel-supported lock-free
> > +atomics. ARM/Linux is an example of this: the kernel `provides
> > +<https://www.kernel.org/doc/Documentation/arm/kernel_user_helpers.txt>`_
> a
> > +function which on older CPUs contains a "magically-restartable" atomic
> sequence
> > +(which looks atomic so long as there's only one CPU), and contains
> actual atomic
> > +instructions on newer multicore models. This sort of functionality can
> typically
> > +be provided on any architecture, if all CPUs which are missing atomic
> > +compare-and-swap support are uniprocessor (no SMP). This is almost
> always the
> > +case. The only common architecture without that property is SPARC --
> SPARCV8 SMP
> > +systems were common, yet it doesn't support any sort of compare-and-swap
> > +operation.
> > +
> > +In either of these cases, the Target in LLVM can claim support for
> atomics of an
> > +appropriate size, and then implement some subset of the operations via
> libcalls
> > +to a ``__sync_*`` function. Such functions *must* not use locks in their
> > +implementation, because unlike the ``__atomic_*`` routines used by
> > +AtomicExpandPass, these may be mixed-and-matched with native
> instructions by the
> > +target lowering.
> > +
> > +Further, these routines do not need to be shared, as they are
> stateless. So,
> > +there is no issue with having multiple copies included in one binary.
> Thus,
> > +typically these routines are implemented by the statically-linked
> compiler
> > +runtime support library.
> > +
> > +LLVM will emit a call to an appropriate ``__sync_*`` routine if the
> target
> > +ISelLowering code has set the corresponding ``ATOMIC_CMPXCHG``,
> ``ATOMIC_SWAP``,
> > +or ``ATOMIC_LOAD_*`` operation to "Expand", and if it has opted-into the
> > +availablity of those library functions via a call to
> ``initSyncLibcalls()``.
> > +
> > +The full set of functions that may be called by LLVM is (for ``N``
> being 1, 2,
> > +4, 8, or 16)::
> > +
> > +  iN __sync_val_compare_and_swap_N(iN *ptr, iN expected, iN desired)
> > +  iN __sync_lock_test_and_set_N(iN *ptr, iN val)
> > +  iN __sync_fetch_and_add_N(iN *ptr, iN val)
> > +  iN __sync_fetch_and_sub_N(iN *ptr, iN val)
> > +  iN __sync_fetch_and_and_N(iN *ptr, iN val)
> > +  iN __sync_fetch_and_or_N(iN *ptr, iN val)
> > +  iN __sync_fetch_and_xor_N(iN *ptr, iN val)
> > +  iN __sync_fetch_and_nand_N(iN *ptr, iN val)
> > +  iN __sync_fetch_and_max_N(iN *ptr, iN val)
> > +  iN __sync_fetch_and_umax_N(iN *ptr, iN val)
> > +  iN __sync_fetch_and_min_N(iN *ptr, iN val)
> > +  iN __sync_fetch_and_umin_N(iN *ptr, iN val)
> > +
> > +This list doesn't include any function for atomic load or store; all
> known
> > +architectures support atomic loads and stores directly (possibly by
> emitting a
> > +fence on either side of a normal load or store.)
> > +
> > +There's also, somewhat separately, the possibility to lower
> ``ATOMIC_FENCE`` to
> > +``__sync_synchronize()``. This may happen or not happen independent of
> all the
> > +above, controlled purely by ``setOperationAction(ISD::ATOMIC_FENCE,
> ...)``.
> >
> > Modified: llvm/trunk/include/llvm/CodeGen/RuntimeLibcalls.h
> > URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/CodeGen/RuntimeLibcalls.h?rev=266002&r1=266001&r2=266002&view=diff
> >
> ==============================================================================
> > --- llvm/trunk/include/llvm/CodeGen/RuntimeLibcalls.h (original)
> > +++ llvm/trunk/include/llvm/CodeGen/RuntimeLibcalls.h Mon Apr 11
> 17:22:33 2016
> > @@ -336,7 +336,11 @@ namespace RTLIB {
> >      // EXCEPTION HANDLING
> >      UNWIND_RESUME,
> >
> > -    // Family ATOMICs
> > +    // Note: there's two sets of atomics libcalls; see
> > +    // <http://llvm.org/docs/Atomics.html> for more info on the
> > +    // difference between them.
> > +
> > +    // Atomic '__sync_*' libcalls.
> >      SYNC_VAL_COMPARE_AND_SWAP_1,
> >      SYNC_VAL_COMPARE_AND_SWAP_2,
> >      SYNC_VAL_COMPARE_AND_SWAP_4,
> > @@ -398,6 +402,73 @@ namespace RTLIB {
> >      SYNC_FETCH_AND_UMIN_8,
> >      SYNC_FETCH_AND_UMIN_16,
> >
> > +    // Atomic '__atomic_*' libcalls.
> > +    ATOMIC_LOAD,
> > +    ATOMIC_LOAD_1,
> > +    ATOMIC_LOAD_2,
> > +    ATOMIC_LOAD_4,
> > +    ATOMIC_LOAD_8,
> > +    ATOMIC_LOAD_16,
> > +
> > +    ATOMIC_STORE,
> > +    ATOMIC_STORE_1,
> > +    ATOMIC_STORE_2,
> > +    ATOMIC_STORE_4,
> > +    ATOMIC_STORE_8,
> > +    ATOMIC_STORE_16,
> > +
> > +    ATOMIC_EXCHANGE,
> > +    ATOMIC_EXCHANGE_1,
> > +    ATOMIC_EXCHANGE_2,
> > +    ATOMIC_EXCHANGE_4,
> > +    ATOMIC_EXCHANGE_8,
> > +    ATOMIC_EXCHANGE_16,
> > +
> > +    ATOMIC_COMPARE_EXCHANGE,
> > +    ATOMIC_COMPARE_EXCHANGE_1,
> > +    ATOMIC_COMPARE_EXCHANGE_2,
> > +    ATOMIC_COMPARE_EXCHANGE_4,
> > +    ATOMIC_COMPARE_EXCHANGE_8,
> > +    ATOMIC_COMPARE_EXCHANGE_16,
> > +
> > +    ATOMIC_FETCH_ADD_1,
> > +    ATOMIC_FETCH_ADD_2,
> > +    ATOMIC_FETCH_ADD_4,
> > +    ATOMIC_FETCH_ADD_8,
> > +    ATOMIC_FETCH_ADD_16,
> > +
> > +    ATOMIC_FETCH_SUB_1,
> > +    ATOMIC_FETCH_SUB_2,
> > +    ATOMIC_FETCH_SUB_4,
> > +    ATOMIC_FETCH_SUB_8,
> > +    ATOMIC_FETCH_SUB_16,
> > +
> > +    ATOMIC_FETCH_AND_1,
> > +    ATOMIC_FETCH_AND_2,
> > +    ATOMIC_FETCH_AND_4,
> > +    ATOMIC_FETCH_AND_8,
> > +    ATOMIC_FETCH_AND_16,
> > +
> > +    ATOMIC_FETCH_OR_1,
> > +    ATOMIC_FETCH_OR_2,
> > +    ATOMIC_FETCH_OR_4,
> > +    ATOMIC_FETCH_OR_8,
> > +    ATOMIC_FETCH_OR_16,
> > +
> > +    ATOMIC_FETCH_XOR_1,
> > +    ATOMIC_FETCH_XOR_2,
> > +    ATOMIC_FETCH_XOR_4,
> > +    ATOMIC_FETCH_XOR_8,
> > +    ATOMIC_FETCH_XOR_16,
> > +
> > +    ATOMIC_FETCH_NAND_1,
> > +    ATOMIC_FETCH_NAND_2,
> > +    ATOMIC_FETCH_NAND_4,
> > +    ATOMIC_FETCH_NAND_8,
> > +    ATOMIC_FETCH_NAND_16,
> > +
> > +    ATOMIC_IS_LOCK_FREE,
> > +
> >      // Stack Protector Fail.
> >      STACKPROTECTOR_CHECK_FAIL,
> >
> >
> > Modified: llvm/trunk/include/llvm/Target/TargetLowering.h
> > URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Target/TargetLowering.h?rev=266002&r1=266001&r2=266002&view=diff
> >
> ==============================================================================
> > --- llvm/trunk/include/llvm/Target/TargetLowering.h (original)
> > +++ llvm/trunk/include/llvm/Target/TargetLowering.h Mon Apr 11 17:22:33
> 2016
> > @@ -1059,6 +1059,14 @@ public:
> >    /// \name Helpers for atomic expansion.
> >    /// @{
> >
> > +  /// Returns the maximum atomic operation size (in bits) supported by
> > +  /// the backend. Atomic operations greater than this size (as well
> > +  /// as ones that are not naturally aligned), will be expanded by
> > +  /// AtomicExpandPass into an __atomic_* library call.
> > +  unsigned getMaxAtomicSizeInBitsSupported() const {
> > +    return MaxAtomicSizeInBitsSupported;
> > +  }
> > +
> >    /// Whether AtomicExpandPass should automatically insert fences and
> reduce
> >    /// ordering for this atomic. This should be true for most
> architectures with
> >    /// weak memory ordering. Defaults to false.
> > @@ -1454,6 +1462,14 @@ protected:
> >      MinStackArgumentAlignment = Align;
> >    }
> >
> > +  /// Set the maximum atomic operation size supported by the
> > +  /// backend. Atomic operations greater than this size (as well as
> > +  /// ones that are not naturally aligned), will be expanded by
> > +  /// AtomicExpandPass into an __atomic_* library call.
> > +  void setMaxAtomicSizeInBitsSupported(unsigned SizeInBits) {
> > +    MaxAtomicSizeInBitsSupported = SizeInBits;
> > +  }
> > +
> >  public:
> >
> //===--------------------------------------------------------------------===//
> >    // Addressing mode description hooks (used by LSR etc).
> > @@ -1863,6 +1879,9 @@ private:
> >    /// The preferred loop alignment.
> >    unsigned PrefLoopAlignment;
> >
> > +  /// Size in bits of the maximum atomics size the backend supports.
> > +  /// Accesses larger than this will be expanded by AtomicExpandPass.
> > +  unsigned MaxAtomicSizeInBitsSupported;
> >
> >    /// If set to a physical register, this specifies the register that
> >    /// llvm.savestack/llvm.restorestack should save and restore.
> >
> > Modified: llvm/trunk/lib/CodeGen/AtomicExpandPass.cpp
> > URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/AtomicExpandPass.cpp?rev=266002&r1=266001&r2=266002&view=diff
> >
> ==============================================================================
> > --- llvm/trunk/lib/CodeGen/AtomicExpandPass.cpp (original)
> > +++ llvm/trunk/lib/CodeGen/AtomicExpandPass.cpp Mon Apr 11 17:22:33 2016
> > @@ -8,10 +8,10 @@
> >
> //===----------------------------------------------------------------------===//
> >  //
> >  // This file contains a pass (at IR level) to replace atomic
> instructions with
> > -// target specific instruction which implement the same semantics in a
> way
> > -// which better fits the target backend.  This can include the use of
> either
> > -// (intrinsic-based) load-linked/store-conditional loops,
> AtomicCmpXchg, or
> > -// type coercions.
> > +// __atomic_* library calls, or target specific instruction which
> implement the
> > +// same semantics in a way which better fits the target backend.  This
> can
> > +// include the use of (intrinsic-based) load-linked/store-conditional
> loops,
> > +// AtomicCmpXchg, or type coercions.
> >  //
> >
> //===----------------------------------------------------------------------===//
> >
> > @@ -64,19 +64,95 @@ namespace {
> >      bool expandAtomicCmpXchg(AtomicCmpXchgInst *CI);
> >      bool isIdempotentRMW(AtomicRMWInst *AI);
> >      bool simplifyIdempotentRMW(AtomicRMWInst *AI);
> > +
> > +    bool expandAtomicOpToLibcall(Instruction *I, unsigned Size,
> unsigned Align,
> > +                                 Value *PointerOperand, Value
> *ValueOperand,
> > +                                 Value *CASExpected, AtomicOrdering
> Ordering,
> > +                                 AtomicOrdering Ordering2,
> > +                                 ArrayRef<RTLIB::Libcall> Libcalls);
> > +    void expandAtomicLoadToLibcall(LoadInst *LI);
> > +    void expandAtomicStoreToLibcall(StoreInst *LI);
> > +    void expandAtomicRMWToLibcall(AtomicRMWInst *I);
> > +    void expandAtomicCASToLibcall(AtomicCmpXchgInst *I);
> >    };
> >  }
> >
> >  char AtomicExpand::ID = 0;
> >  char &llvm::AtomicExpandID = AtomicExpand::ID;
> > -INITIALIZE_TM_PASS(AtomicExpand, "atomic-expand",
> > -    "Expand Atomic calls in terms of either load-linked &
> store-conditional or cmpxchg",
> > -    false, false)
> > +INITIALIZE_TM_PASS(AtomicExpand, "atomic-expand", "Expand Atomic
> instructions",
> > +                   false, false)
> >
> >  FunctionPass *llvm::createAtomicExpandPass(const TargetMachine *TM) {
> >    return new AtomicExpand(TM);
> >  }
> >
> > +namespace {
> > +// Helper functions to retrieve the size of atomic instructions.
> > +unsigned getAtomicOpSize(LoadInst *LI) {
> > +  const DataLayout &DL = LI->getModule()->getDataLayout();
> > +  return DL.getTypeStoreSize(LI->getType());
> > +}
> > +
> > +unsigned getAtomicOpSize(StoreInst *SI) {
> > +  const DataLayout &DL = SI->getModule()->getDataLayout();
> > +  return DL.getTypeStoreSize(SI->getValueOperand()->getType());
> > +}
> > +
> > +unsigned getAtomicOpSize(AtomicRMWInst *RMWI) {
> > +  const DataLayout &DL = RMWI->getModule()->getDataLayout();
> > +  return DL.getTypeStoreSize(RMWI->getValOperand()->getType());
> > +}
> > +
> > +unsigned getAtomicOpSize(AtomicCmpXchgInst *CASI) {
> > +  const DataLayout &DL = CASI->getModule()->getDataLayout();
> > +  return DL.getTypeStoreSize(CASI->getCompareOperand()->getType());
> > +}
> > +
> > +// Helper functions to retrieve the alignment of atomic instructions.
> > +unsigned getAtomicOpAlign(LoadInst *LI) {
> > +  unsigned Align = LI->getAlignment();
> > +  // In the future, if this IR restriction is relaxed, we should
> > +  // return DataLayout::getABITypeAlignment when there's no align
> > +  // value.
> > +  assert(Align != 0 && "An atomic LoadInst always has an explicit
> alignment");
> > +  return Align;
> > +}
> > +
> > +unsigned getAtomicOpAlign(StoreInst *SI) {
> > +  unsigned Align = SI->getAlignment();
> > +  // In the future, if this IR restriction is relaxed, we should
> > +  // return DataLayout::getABITypeAlignment when there's no align
> > +  // value.
> > +  assert(Align != 0 && "An atomic StoreInst always has an explicit
> alignment");
> > +  return Align;
> > +}
> > +
> > +unsigned getAtomicOpAlign(AtomicRMWInst *RMWI) {
> > +  // TODO(PR27168): This instruction has no alignment attribute, but
> unlike the
> > +  // default alignment for load/store, the default here is to assume
> > +  // it has NATURAL alignment, not DataLayout-specified alignment.
> > +  const DataLayout &DL = RMWI->getModule()->getDataLayout();
> > +  return DL.getTypeStoreSize(RMWI->getValOperand()->getType());
> > +}
> > +
> > +unsigned getAtomicOpAlign(AtomicCmpXchgInst *CASI) {
> > +  // TODO(PR27168): same comment as above.
> > +  const DataLayout &DL = CASI->getModule()->getDataLayout();
> > +  return DL.getTypeStoreSize(CASI->getCompareOperand()->getType());
> > +}
> > +
> > +// Determine if a particular atomic operation has a supported size,
> > +// and is of appropriate alignment, to be passed through for target
> > +// lowering. (Versus turning into a __atomic libcall)
> > +template <typename Inst>
> > +bool atomicSizeSupported(const TargetLowering *TLI, Inst *I) {
> > +  unsigned Size = getAtomicOpSize(I);
> > +  unsigned Align = getAtomicOpAlign(I);
> > +  return Align >= Size && Size <=
> TLI->getMaxAtomicSizeInBitsSupported() / 8;
> > +}
> > +
> > +} // end anonymous namespace
> > +
> >  bool AtomicExpand::runOnFunction(Function &F) {
> >    if (!TM || !TM->getSubtargetImpl(F)->enableAtomicExpand())
> >      return false;
> > @@ -100,6 +176,33 @@ bool AtomicExpand::runOnFunction(Functio
> >      auto CASI = dyn_cast<AtomicCmpXchgInst>(I);
> >      assert((LI || SI || RMWI || CASI) && "Unknown atomic instruction");
> >
> > +    // If the Size/Alignment is not supported, replace with a libcall.
> > +    if (LI) {
> > +      if (!atomicSizeSupported(TLI, LI)) {
> > +        expandAtomicLoadToLibcall(LI);
> > +        MadeChange = true;
> > +        continue;
> > +      }
> > +    } else if (SI) {
> > +      if (!atomicSizeSupported(TLI, SI)) {
> > +        expandAtomicStoreToLibcall(SI);
> > +        MadeChange = true;
> > +        continue;
> > +      }
> > +    } else if (RMWI) {
> > +      if (!atomicSizeSupported(TLI, RMWI)) {
> > +        expandAtomicRMWToLibcall(RMWI);
> > +        MadeChange = true;
> > +        continue;
> > +      }
> > +    } else if (CASI) {
> > +      if (!atomicSizeSupported(TLI, CASI)) {
> > +        expandAtomicCASToLibcall(CASI);
> > +        MadeChange = true;
> > +        continue;
> > +      }
> > +    }
> > +
> >      if (TLI->shouldInsertFencesForAtomic(I)) {
> >        auto FenceOrdering = AtomicOrdering::Monotonic;
> >        bool IsStore, IsLoad;
> > @@ -144,7 +247,7 @@ bool AtomicExpand::runOnFunction(Functio
> >          assert(LI->getType()->isIntegerTy() && "invariant broken");
> >          MadeChange = true;
> >        }
> > -
> > +
> >        MadeChange |= tryExpandAtomicLoad(LI);
> >      } else if (SI) {
> >        if (SI->getValueOperand()->getType()->isFloatingPointTy()) {
> > @@ -833,3 +936,381 @@ bool llvm::expandAtomicRMWToCmpXchg(Atom
> >
> >    return true;
> >  }
> > +
> > +// This converts from LLVM's internal AtomicOrdering enum to the
> > +// memory_order_* value required by the __atomic_* libcalls.
> > +static int libcallAtomicModel(AtomicOrdering AO) {
> > +  enum {
> > +    AO_ABI_memory_order_relaxed = 0,
> > +    AO_ABI_memory_order_consume = 1,
> > +    AO_ABI_memory_order_acquire = 2,
> > +    AO_ABI_memory_order_release = 3,
> > +    AO_ABI_memory_order_acq_rel = 4,
> > +    AO_ABI_memory_order_seq_cst = 5
> > +  };
> > +
> > +  switch (AO) {
> > +  case AtomicOrdering::NotAtomic:
> > +    llvm_unreachable("Expected atomic memory order.");
> > +  case AtomicOrdering::Unordered:
> > +  case AtomicOrdering::Monotonic:
> > +    return AO_ABI_memory_order_relaxed;
> > +  // Not implemented yet in llvm:
> > +  // case AtomicOrdering::Consume:
> > +  //  return AO_ABI_memory_order_consume;
> > +  case AtomicOrdering::Acquire:
> > +    return AO_ABI_memory_order_acquire;
> > +  case AtomicOrdering::Release:
> > +    return AO_ABI_memory_order_release;
> > +  case AtomicOrdering::AcquireRelease:
> > +    return AO_ABI_memory_order_acq_rel;
> > +  case AtomicOrdering::SequentiallyConsistent:
> > +    return AO_ABI_memory_order_seq_cst;
> > +  }
> > +  llvm_unreachable("Unknown atomic memory order.");
> > +}
> > +
> > +// In order to use one of the sized library calls such as
> > +// __atomic_fetch_add_4, the alignment must be sufficient, the size
> > +// must be one of the potentially-specialized sizes, and the value
> > +// type must actually exist in C on the target (otherwise, the
> > +// function wouldn't actually be defined.)
> > +static bool canUseSizedAtomicCall(unsigned Size, unsigned Align,
> > +                                  const DataLayout &DL) {
> > +  // TODO: "LargestSize" is an approximation for "largest type that
> > +  // you can express in C". It seems to be the case that int128 is
> > +  // supported on all 64-bit platforms, otherwise only up to 64-bit
> > +  // integers are supported. If we get this wrong, then we'll try to
> > +  // call a sized libcall that doesn't actually exist. There should
> > +  // really be some more reliable way in LLVM of determining integer
> > +  // sizes which are valid in the target's C ABI...
> > +  unsigned LargestSize = DL.getLargestLegalIntTypeSize() >= 64 ? 16 : 8;
> > +  return Align >= Size &&
> > +         (Size == 1 || Size == 2 || Size == 4 || Size == 8 || Size ==
> 16) &&
> > +         Size <= LargestSize;
> > +}
> > +
> > +void AtomicExpand::expandAtomicLoadToLibcall(LoadInst *I) {
> > +  static const RTLIB::Libcall Libcalls[6] = {
> > +      RTLIB::ATOMIC_LOAD,   RTLIB::ATOMIC_LOAD_1, RTLIB::ATOMIC_LOAD_2,
> > +      RTLIB::ATOMIC_LOAD_4, RTLIB::ATOMIC_LOAD_8,
> RTLIB::ATOMIC_LOAD_16};
> > +  unsigned Size = getAtomicOpSize(I);
> > +  unsigned Align = getAtomicOpAlign(I);
> > +
> > +  bool expanded = expandAtomicOpToLibcall(
> > +      I, Size, Align, I->getPointerOperand(), nullptr, nullptr,
> > +      I->getOrdering(), AtomicOrdering::NotAtomic, Libcalls);
> > +  assert(expanded && "expandAtomicOpToLibcall shouldn't fail tor Load");
> > +}
> > +
> > +void AtomicExpand::expandAtomicStoreToLibcall(StoreInst *I) {
> > +  static const RTLIB::Libcall Libcalls[6] = {
> > +      RTLIB::ATOMIC_STORE,   RTLIB::ATOMIC_STORE_1,
> RTLIB::ATOMIC_STORE_2,
> > +      RTLIB::ATOMIC_STORE_4, RTLIB::ATOMIC_STORE_8,
> RTLIB::ATOMIC_STORE_16};
> > +  unsigned Size = getAtomicOpSize(I);
> > +  unsigned Align = getAtomicOpAlign(I);
> > +
> > +  bool expanded = expandAtomicOpToLibcall(
> > +      I, Size, Align, I->getPointerOperand(), I->getValueOperand(),
> nullptr,
> > +      I->getOrdering(), AtomicOrdering::NotAtomic, Libcalls);
> > +  assert(expanded && "expandAtomicOpToLibcall shouldn't fail tor
> Store");
> > +}
> > +
> > +void AtomicExpand::expandAtomicCASToLibcall(AtomicCmpXchgInst *I) {
> > +  static const RTLIB::Libcall Libcalls[6] = {
> > +      RTLIB::ATOMIC_COMPARE_EXCHANGE,
>  RTLIB::ATOMIC_COMPARE_EXCHANGE_1,
> > +      RTLIB::ATOMIC_COMPARE_EXCHANGE_2,
> RTLIB::ATOMIC_COMPARE_EXCHANGE_4,
> > +      RTLIB::ATOMIC_COMPARE_EXCHANGE_8,
> RTLIB::ATOMIC_COMPARE_EXCHANGE_16};
> > +  unsigned Size = getAtomicOpSize(I);
> > +  unsigned Align = getAtomicOpAlign(I);
> > +
> > +  bool expanded = expandAtomicOpToLibcall(
> > +      I, Size, Align, I->getPointerOperand(), I->getNewValOperand(),
> > +      I->getCompareOperand(), I->getSuccessOrdering(),
> I->getFailureOrdering(),
> > +      Libcalls);
> > +  assert(expanded && "expandAtomicOpToLibcall shouldn't fail tor CAS");
> > +}
> > +
> > +static ArrayRef<RTLIB::Libcall> GetRMWLibcall(AtomicRMWInst::BinOp Op) {
> > +  static const RTLIB::Libcall LibcallsXchg[6] = {
> > +      RTLIB::ATOMIC_EXCHANGE,   RTLIB::ATOMIC_EXCHANGE_1,
> > +      RTLIB::ATOMIC_EXCHANGE_2, RTLIB::ATOMIC_EXCHANGE_4,
> > +      RTLIB::ATOMIC_EXCHANGE_8, RTLIB::ATOMIC_EXCHANGE_16};
> > +  static const RTLIB::Libcall LibcallsAdd[6] = {
> > +      RTLIB::UNKNOWN_LIBCALL,    RTLIB::ATOMIC_FETCH_ADD_1,
> > +      RTLIB::ATOMIC_FETCH_ADD_2, RTLIB::ATOMIC_FETCH_ADD_4,
> > +      RTLIB::ATOMIC_FETCH_ADD_8, RTLIB::ATOMIC_FETCH_ADD_16};
> > +  static const RTLIB::Libcall LibcallsSub[6] = {
> > +      RTLIB::UNKNOWN_LIBCALL,    RTLIB::ATOMIC_FETCH_SUB_1,
> > +      RTLIB::ATOMIC_FETCH_SUB_2, RTLIB::ATOMIC_FETCH_SUB_4,
> > +      RTLIB::ATOMIC_FETCH_SUB_8, RTLIB::ATOMIC_FETCH_SUB_16};
> > +  static const RTLIB::Libcall LibcallsAnd[6] = {
> > +      RTLIB::UNKNOWN_LIBCALL,    RTLIB::ATOMIC_FETCH_AND_1,
> > +      RTLIB::ATOMIC_FETCH_AND_2, RTLIB::ATOMIC_FETCH_AND_4,
> > +      RTLIB::ATOMIC_FETCH_AND_8, RTLIB::ATOMIC_FETCH_AND_16};
> > +  static const RTLIB::Libcall LibcallsOr[6] = {
> > +      RTLIB::UNKNOWN_LIBCALL,   RTLIB::ATOMIC_FETCH_OR_1,
> > +      RTLIB::ATOMIC_FETCH_OR_2, RTLIB::ATOMIC_FETCH_OR_4,
> > +      RTLIB::ATOMIC_FETCH_OR_8, RTLIB::ATOMIC_FETCH_OR_16};
> > +  static const RTLIB::Libcall LibcallsXor[6] = {
> > +      RTLIB::UNKNOWN_LIBCALL,    RTLIB::ATOMIC_FETCH_XOR_1,
> > +      RTLIB::ATOMIC_FETCH_XOR_2, RTLIB::ATOMIC_FETCH_XOR_4,
> > +      RTLIB::ATOMIC_FETCH_XOR_8, RTLIB::ATOMIC_FETCH_XOR_16};
> > +  static const RTLIB::Libcall LibcallsNand[6] = {
> > +      RTLIB::UNKNOWN_LIBCALL,     RTLIB::ATOMIC_FETCH_NAND_1,
> > +      RTLIB::ATOMIC_FETCH_NAND_2, RTLIB::ATOMIC_FETCH_NAND_4,
> > +      RTLIB::ATOMIC_FETCH_NAND_8, RTLIB::ATOMIC_FETCH_NAND_16};
> > +
> > +  switch (Op) {
> > +  case AtomicRMWInst::BAD_BINOP:
> > +    llvm_unreachable("Should not have BAD_BINOP.");
> > +  case AtomicRMWInst::Xchg:
> > +    return LibcallsXchg;
> > +  case AtomicRMWInst::Add:
> > +    return LibcallsAdd;
> > +  case AtomicRMWInst::Sub:
> > +    return LibcallsSub;
> > +  case AtomicRMWInst::And:
> > +    return LibcallsAnd;
> > +  case AtomicRMWInst::Or:
> > +    return LibcallsOr;
> > +  case AtomicRMWInst::Xor:
> > +    return LibcallsXor;
> > +  case AtomicRMWInst::Nand:
> > +    return LibcallsNand;
> > +  case AtomicRMWInst::Max:
> > +  case AtomicRMWInst::Min:
> > +  case AtomicRMWInst::UMax:
> > +  case AtomicRMWInst::UMin:
> > +    // No atomic libcalls are available for max/min/umax/umin.
> > +    return {};
> > +  }
> > +  llvm_unreachable("Unexpected AtomicRMW operation.");
> > +}
> > +
> > +void AtomicExpand::expandAtomicRMWToLibcall(AtomicRMWInst *I) {
> > +  ArrayRef<RTLIB::Libcall> Libcalls = GetRMWLibcall(I->getOperation());
> > +
> > +  unsigned Size = getAtomicOpSize(I);
> > +  unsigned Align = getAtomicOpAlign(I);
> > +
> > +  bool Success = false;
> > +  if (!Libcalls.empty())
> > +    Success = expandAtomicOpToLibcall(
> > +        I, Size, Align, I->getPointerOperand(), I->getValOperand(),
> nullptr,
> > +        I->getOrdering(), AtomicOrdering::NotAtomic, Libcalls);
> > +
> > +  // The expansion failed: either there were no libcalls at all for
> > +  // the operation (min/max), or there were only size-specialized
> > +  // libcalls (add/sub/etc) and we needed a generic. So, expand to a
> > +  // CAS libcall, via a CAS loop, instead.
> > +  if (!Success) {
> > +    expandAtomicRMWToCmpXchg(I, [this](IRBuilder<> &Builder, Value
> *Addr,
> > +                                       Value *Loaded, Value *NewVal,
> > +                                       AtomicOrdering MemOpOrder,
> > +                                       Value *&Success, Value
> *&NewLoaded) {
> > +      // Create the CAS instruction normally...
> > +      AtomicCmpXchgInst *Pair = Builder.CreateAtomicCmpXchg(
> > +          Addr, Loaded, NewVal, MemOpOrder,
> > +          AtomicCmpXchgInst::getStrongestFailureOrdering(MemOpOrder));
> > +      Success = Builder.CreateExtractValue(Pair, 1, "success");
> > +      NewLoaded = Builder.CreateExtractValue(Pair, 0, "newloaded");
> > +
> > +      // ...and then expand the CAS into a libcall.
> > +      expandAtomicCASToLibcall(Pair);
> > +    });
> > +  }
> > +}
> > +
> > +// A helper routine for the above expandAtomic*ToLibcall functions.
> > +//
> > +// 'Libcalls' contains an array of enum values for the particular
> > +// ATOMIC libcalls to be emitted. All of the other arguments besides
> > +// 'I' are extracted from the Instruction subclass by the
> > +// caller. Depending on the particular call, some will be null.
> > +bool AtomicExpand::expandAtomicOpToLibcall(
> > +    Instruction *I, unsigned Size, unsigned Align, Value
> *PointerOperand,
> > +    Value *ValueOperand, Value *CASExpected, AtomicOrdering Ordering,
> > +    AtomicOrdering Ordering2, ArrayRef<RTLIB::Libcall> Libcalls) {
> > +  assert(Libcalls.size() == 6);
> > +
> > +  LLVMContext &Ctx = I->getContext();
> > +  Module *M = I->getModule();
> > +  const DataLayout &DL = M->getDataLayout();
> > +  IRBuilder<> Builder(I);
> > +  IRBuilder<> AllocaBuilder(&I->getFunction()->getEntryBlock().front());
> > +
> > +  bool UseSizedLibcall = canUseSizedAtomicCall(Size, Align, DL);
> > +  Type *SizedIntTy = Type::getIntNTy(Ctx, Size * 8);
> > +
> > +  unsigned AllocaAlignment = DL.getPrefTypeAlignment(SizedIntTy);
> > +
> > +  // TODO: the "order" argument type is "int", not int32. So
> > +  // getInt32Ty may be wrong if the arch uses e.g. 16-bit ints.
> > +  ConstantInt *SizeVal64 = ConstantInt::get(Type::getInt64Ty(Ctx),
> Size);
> > +  Constant *OrderingVal =
> > +      ConstantInt::get(Type::getInt32Ty(Ctx),
> libcallAtomicModel(Ordering));
> > +  Constant *Ordering2Val = CASExpected
> > +                               ? ConstantInt::get(Type::getInt32Ty(Ctx),
> > +
> libcallAtomicModel(Ordering2))
> > +                               : nullptr;
> > +  bool HasResult = I->getType() != Type::getVoidTy(Ctx);
> > +
> > +  RTLIB::Libcall RTLibType;
> > +  if (UseSizedLibcall) {
> > +    switch (Size) {
> > +    case 1: RTLibType = Libcalls[1]; break;
> > +    case 2: RTLibType = Libcalls[2]; break;
> > +    case 4: RTLibType = Libcalls[3]; break;
> > +    case 8: RTLibType = Libcalls[4]; break;
> > +    case 16: RTLibType = Libcalls[5]; break;
> > +    }
> > +  } else if (Libcalls[0] != RTLIB::UNKNOWN_LIBCALL) {
> > +    RTLibType = Libcalls[0];
> > +  } else {
> > +    // Can't use sized function, and there's no generic for this
> > +    // operation, so give up.
> > +    return false;
> > +  }
> > +
> > +  // Build up the function call. There's two kinds. First, the sized
> > +  // variants.  These calls are going to be one of the following (with
> > +  // N=1,2,4,8,16):
> > +  //  iN    __atomic_load_N(iN *ptr, int ordering)
> > +  //  void  __atomic_store_N(iN *ptr, iN val, int ordering)
> > +  //  iN    __atomic_{exchange|fetch_*}_N(iN *ptr, iN val, int ordering)
> > +  //  bool  __atomic_compare_exchange_N(iN *ptr, iN *expected, iN
> desired,
> > +  //                                    int success_order, int
> failure_order)
> > +  //
> > +  // Note that these functions can be used for non-integer atomic
> > +  // operations, the values just need to be bitcast to integers on the
> > +  // way in and out.
> > +  //
> > +  // And, then, the generic variants. They look like the following:
> > +  //  void  __atomic_load(size_t size, void *ptr, void *ret, int
> ordering)
> > +  //  void  __atomic_store(size_t size, void *ptr, void *val, int
> ordering)
> > +  //  void  __atomic_exchange(size_t size, void *ptr, void *val, void
> *ret,
> > +  //                          int ordering)
> > +  //  bool  __atomic_compare_exchange(size_t size, void *ptr, void
> *expected,
> > +  //                                  void *desired, int success_order,
> > +  //                                  int failure_order)
> > +  //
> > +  // The different signatures are built up depending on the
> > +  // 'UseSizedLibcall', 'CASExpected', 'ValueOperand', and 'HasResult'
> > +  // variables.
> > +
> > +  AllocaInst *AllocaCASExpected = nullptr;
> > +  Value *AllocaCASExpected_i8 = nullptr;
> > +  AllocaInst *AllocaValue = nullptr;
> > +  Value *AllocaValue_i8 = nullptr;
> > +  AllocaInst *AllocaResult = nullptr;
> > +  Value *AllocaResult_i8 = nullptr;
> > +
> > +  Type *ResultTy;
> > +  SmallVector<Value *, 6> Args;
> > +  AttributeSet Attr;
> > +
> > +  // 'size' argument.
> > +  if (!UseSizedLibcall) {
> > +    // Note, getIntPtrType is assumed equivalent to size_t.
> > +    Args.push_back(ConstantInt::get(DL.getIntPtrType(Ctx), Size));
> > +  }
> > +
> > +  // 'ptr' argument.
> > +  Value *PtrVal =
> > +      Builder.CreateBitCast(PointerOperand, Type::getInt8PtrTy(Ctx));
> > +  Args.push_back(PtrVal);
> > +
> > +  // 'expected' argument, if present.
> > +  if (CASExpected) {
> > +    AllocaCASExpected =
> AllocaBuilder.CreateAlloca(CASExpected->getType());
> > +    AllocaCASExpected->setAlignment(AllocaAlignment);
> > +    AllocaCASExpected_i8 =
> > +        Builder.CreateBitCast(AllocaCASExpected,
> Type::getInt8PtrTy(Ctx));
> > +    Builder.CreateLifetimeStart(AllocaCASExpected_i8, SizeVal64);
> > +    Builder.CreateAlignedStore(CASExpected, AllocaCASExpected,
> AllocaAlignment);
> > +    Args.push_back(AllocaCASExpected_i8);
> > +  }
> > +
> > +  // 'val' argument ('desired' for cas), if present.
> > +  if (ValueOperand) {
> > +    if (UseSizedLibcall) {
> > +      Value *IntValue =
> > +          Builder.CreateBitOrPointerCast(ValueOperand, SizedIntTy);
> > +      Args.push_back(IntValue);
> > +    } else {
> > +      AllocaValue = AllocaBuilder.CreateAlloca(ValueOperand->getType());
> > +      AllocaValue->setAlignment(AllocaAlignment);
> > +      AllocaValue_i8 =
> > +          Builder.CreateBitCast(AllocaValue, Type::getInt8PtrTy(Ctx));
> > +      Builder.CreateLifetimeStart(AllocaValue_i8, SizeVal64);
> > +      Builder.CreateAlignedStore(ValueOperand, AllocaValue,
> AllocaAlignment);
> > +      Args.push_back(AllocaValue_i8);
> > +    }
> > +  }
> > +
> > +  // 'ret' argument.
> > +  if (!CASExpected && HasResult && !UseSizedLibcall) {
> > +    AllocaResult = AllocaBuilder.CreateAlloca(I->getType());
> > +    AllocaResult->setAlignment(AllocaAlignment);
> > +    AllocaResult_i8 =
> > +        Builder.CreateBitCast(AllocaResult, Type::getInt8PtrTy(Ctx));
> > +    Builder.CreateLifetimeStart(AllocaResult_i8, SizeVal64);
> > +    Args.push_back(AllocaResult_i8);
> > +  }
> > +
> > +  // 'ordering' ('success_order' for cas) argument.
> > +  Args.push_back(OrderingVal);
> > +
> > +  // 'failure_order' argument, if present.
> > +  if (Ordering2Val)
> > +    Args.push_back(Ordering2Val);
> > +
> > +  // Now, the return type.
> > +  if (CASExpected) {
> > +    ResultTy = Type::getInt1Ty(Ctx);
> > +    Attr = Attr.addAttribute(Ctx, AttributeSet::ReturnIndex,
> Attribute::ZExt);
> > +  } else if (HasResult && UseSizedLibcall)
> > +    ResultTy = SizedIntTy;
> > +  else
> > +    ResultTy = Type::getVoidTy(Ctx);
> > +
> > +  // Done with setting up arguments and return types, create the call:
> > +  SmallVector<Type *, 6> ArgTys;
> > +  for (Value *Arg : Args)
> > +    ArgTys.push_back(Arg->getType());
> > +  FunctionType *FnType = FunctionType::get(ResultTy, ArgTys, false);
> > +  Constant *LibcallFn =
> > +      M->getOrInsertFunction(TLI->getLibcallName(RTLibType), FnType,
> Attr);
> > +  CallInst *Call = Builder.CreateCall(LibcallFn, Args);
> > +  Call->setAttributes(Attr);
> > +  Value *Result = Call;
> > +
> > +  // And then, extract the results...
> > +  if (ValueOperand && !UseSizedLibcall)
> > +    Builder.CreateLifetimeEnd(AllocaValue_i8, SizeVal64);
> > +
> > +  if (CASExpected) {
> > +    // The final result from the CAS is {load of 'expected' alloca,
> bool result
> > +    // from call}
> > +    Type *FinalResultTy = I->getType();
> > +    Value *V = UndefValue::get(FinalResultTy);
> > +    Value *ExpectedOut =
> > +        Builder.CreateAlignedLoad(AllocaCASExpected, AllocaAlignment);
> > +    Builder.CreateLifetimeEnd(AllocaCASExpected_i8, SizeVal64);
> > +    V = Builder.CreateInsertValue(V, ExpectedOut, 0);
> > +    V = Builder.CreateInsertValue(V, Result, 1);
> > +    I->replaceAllUsesWith(V);
> > +  } else if (HasResult) {
> > +    Value *V;
> > +    if (UseSizedLibcall)
> > +      V = Builder.CreateBitOrPointerCast(Result, I->getType());
> > +    else {
> > +      V = Builder.CreateAlignedLoad(AllocaResult, AllocaAlignment);
> > +      Builder.CreateLifetimeEnd(AllocaResult_i8, SizeVal64);
> > +    }
> > +    I->replaceAllUsesWith(V);
> > +  }
> > +  I->eraseFromParent();
> > +  return true;
> > +}
> >
> > Modified: llvm/trunk/lib/CodeGen/TargetLoweringBase.cpp
> > URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/TargetLoweringBase.cpp?rev=266002&r1=266001&r2=266002&view=diff
> >
> ==============================================================================
> > --- llvm/trunk/lib/CodeGen/TargetLoweringBase.cpp (original)
> > +++ llvm/trunk/lib/CodeGen/TargetLoweringBase.cpp Mon Apr 11 17:22:33
> 2016
> > @@ -405,7 +405,66 @@ static void InitLibcallNames(const char
> >    Names[RTLIB::SYNC_FETCH_AND_UMIN_4] = "__sync_fetch_and_umin_4";
> >    Names[RTLIB::SYNC_FETCH_AND_UMIN_8] = "__sync_fetch_and_umin_8";
> >    Names[RTLIB::SYNC_FETCH_AND_UMIN_16] = "__sync_fetch_and_umin_16";
> > -
> > +
> > +  Names[RTLIB::ATOMIC_LOAD] = "__atomic_load";
> > +  Names[RTLIB::ATOMIC_LOAD_1] = "__atomic_load_1";
> > +  Names[RTLIB::ATOMIC_LOAD_2] = "__atomic_load_2";
> > +  Names[RTLIB::ATOMIC_LOAD_4] = "__atomic_load_4";
> > +  Names[RTLIB::ATOMIC_LOAD_8] = "__atomic_load_8";
> > +  Names[RTLIB::ATOMIC_LOAD_16] = "__atomic_load_16";
> > +
> > +  Names[RTLIB::ATOMIC_STORE] = "__atomic_store";
> > +  Names[RTLIB::ATOMIC_STORE_1] = "__atomic_store_1";
> > +  Names[RTLIB::ATOMIC_STORE_2] = "__atomic_store_2";
> > +  Names[RTLIB::ATOMIC_STORE_4] = "__atomic_store_4";
> > +  Names[RTLIB::ATOMIC_STORE_8] = "__atomic_store_8";
> > +  Names[RTLIB::ATOMIC_STORE_16] = "__atomic_store_16";
> > +
> > +  Names[RTLIB::ATOMIC_EXCHANGE] = "__atomic_exchange";
> > +  Names[RTLIB::ATOMIC_EXCHANGE_1] = "__atomic_exchange_1";
> > +  Names[RTLIB::ATOMIC_EXCHANGE_2] = "__atomic_exchange_2";
> > +  Names[RTLIB::ATOMIC_EXCHANGE_4] = "__atomic_exchange_4";
> > +  Names[RTLIB::ATOMIC_EXCHANGE_8] = "__atomic_exchange_8";
> > +  Names[RTLIB::ATOMIC_EXCHANGE_16] = "__atomic_exchange_16";
> > +
> > +  Names[RTLIB::ATOMIC_COMPARE_EXCHANGE] = "__atomic_compare_exchange";
> > +  Names[RTLIB::ATOMIC_COMPARE_EXCHANGE_1] =
> "__atomic_compare_exchange_1";
> > +  Names[RTLIB::ATOMIC_COMPARE_EXCHANGE_2] =
> "__atomic_compare_exchange_2";
> > +  Names[RTLIB::ATOMIC_COMPARE_EXCHANGE_4] =
> "__atomic_compare_exchange_4";
> > +  Names[RTLIB::ATOMIC_COMPARE_EXCHANGE_8] =
> "__atomic_compare_exchange_8";
> > +  Names[RTLIB::ATOMIC_COMPARE_EXCHANGE_16] =
> "__atomic_compare_exchange_16";
> > +
> > +  Names[RTLIB::ATOMIC_FETCH_ADD_1] = "__atomic_fetch_add_1";
> > +  Names[RTLIB::ATOMIC_FETCH_ADD_2] = "__atomic_fetch_add_2";
> > +  Names[RTLIB::ATOMIC_FETCH_ADD_4] = "__atomic_fetch_add_4";
> > +  Names[RTLIB::ATOMIC_FETCH_ADD_8] = "__atomic_fetch_add_8";
> > +  Names[RTLIB::ATOMIC_FETCH_ADD_16] = "__atomic_fetch_add_16";
> > +  Names[RTLIB::ATOMIC_FETCH_SUB_1] = "__atomic_fetch_sub_1";
> > +  Names[RTLIB::ATOMIC_FETCH_SUB_2] = "__atomic_fetch_sub_2";
> > +  Names[RTLIB::ATOMIC_FETCH_SUB_4] = "__atomic_fetch_sub_4";
> > +  Names[RTLIB::ATOMIC_FETCH_SUB_8] = "__atomic_fetch_sub_8";
> > +  Names[RTLIB::ATOMIC_FETCH_SUB_16] = "__atomic_fetch_sub_16";
> > +  Names[RTLIB::ATOMIC_FETCH_AND_1] = "__atomic_fetch_and_1";
> > +  Names[RTLIB::ATOMIC_FETCH_AND_2] = "__atomic_fetch_and_2";
> > +  Names[RTLIB::ATOMIC_FETCH_AND_4] = "__atomic_fetch_and_4";
> > +  Names[RTLIB::ATOMIC_FETCH_AND_8] = "__atomic_fetch_and_8";
> > +  Names[RTLIB::ATOMIC_FETCH_AND_16] = "__atomic_fetch_and_16";
> > +  Names[RTLIB::ATOMIC_FETCH_OR_1] = "__atomic_fetch_or_1";
> > +  Names[RTLIB::ATOMIC_FETCH_OR_2] = "__atomic_fetch_or_2";
> > +  Names[RTLIB::ATOMIC_FETCH_OR_4] = "__atomic_fetch_or_4";
> > +  Names[RTLIB::ATOMIC_FETCH_OR_8] = "__atomic_fetch_or_8";
> > +  Names[RTLIB::ATOMIC_FETCH_OR_16] = "__atomic_fetch_or_16";
> > +  Names[RTLIB::ATOMIC_FETCH_XOR_1] = "__atomic_fetch_xor_1";
> > +  Names[RTLIB::ATOMIC_FETCH_XOR_2] = "__atomic_fetch_xor_2";
> > +  Names[RTLIB::ATOMIC_FETCH_XOR_4] = "__atomic_fetch_xor_4";
> > +  Names[RTLIB::ATOMIC_FETCH_XOR_8] = "__atomic_fetch_xor_8";
> > +  Names[RTLIB::ATOMIC_FETCH_XOR_16] = "__atomic_fetch_xor_16";
> > +  Names[RTLIB::ATOMIC_FETCH_NAND_1] = "__atomic_fetch_nand_1";
> > +  Names[RTLIB::ATOMIC_FETCH_NAND_2] = "__atomic_fetch_nand_2";
> > +  Names[RTLIB::ATOMIC_FETCH_NAND_4] = "__atomic_fetch_nand_4";
> > +  Names[RTLIB::ATOMIC_FETCH_NAND_8] = "__atomic_fetch_nand_8";
> > +  Names[RTLIB::ATOMIC_FETCH_NAND_16] = "__atomic_fetch_nand_16";
> > +
> >    if (TT.getEnvironment() == Triple::GNU) {
> >      Names[RTLIB::SINCOS_F32] = "sincosf";
> >      Names[RTLIB::SINCOS_F64] = "sincos";
> > @@ -777,6 +836,9 @@ TargetLoweringBase::TargetLoweringBase(c
> >    GatherAllAliasesMaxDepth = 6;
> >    MinStackArgumentAlignment = 1;
> >    MinimumJumpTableEntries = 4;
> > +  // TODO: the default will be switched to 0 in the next commit, along
> > +  // with the Target-specific changes necessary.
> > +  MaxAtomicSizeInBitsSupported = 1024;
> >
> >    InitLibcallNames(LibcallRoutineNames, TM.getTargetTriple());
> >    InitCmpLibcallCCs(CmpLibcallCCs);
> >
> > Modified: llvm/trunk/lib/Target/Sparc/SparcISelLowering.cpp
> > URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/Sparc/SparcISelLowering.cpp?rev=266002&r1=266001&r2=266002&view=diff
> >
> ==============================================================================
> > --- llvm/trunk/lib/Target/Sparc/SparcISelLowering.cpp (original)
> > +++ llvm/trunk/lib/Target/Sparc/SparcISelLowering.cpp Mon Apr 11
> 17:22:33 2016
> > @@ -1611,6 +1611,13 @@ SparcTargetLowering::SparcTargetLowering
> >    }
> >
> >    // ATOMICs.
> > +  // Atomics are only supported on Sparcv9. (32bit atomics are also
> > +  // supported by the Leon sparcv8 variant, but we don't support that
> > +  // yet.)
> > +  if (Subtarget->isV9())
> > +    setMaxAtomicSizeInBitsSupported(64);
> > +  else
> > +    setMaxAtomicSizeInBitsSupported(0);
> >
> >    setOperationAction(ISD::ATOMIC_SWAP, MVT::i32, Legal);
> >    setOperationAction(ISD::ATOMIC_CMP_SWAP, MVT::i32,
> >
> > Added: llvm/trunk/test/Transforms/AtomicExpand/SPARC/libcalls.ll
> > URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/AtomicExpand/SPARC/libcalls.ll?rev=266002&view=auto
> >
> ==============================================================================
> > --- llvm/trunk/test/Transforms/AtomicExpand/SPARC/libcalls.ll (added)
> > +++ llvm/trunk/test/Transforms/AtomicExpand/SPARC/libcalls.ll Mon Apr 11
> 17:22:33 2016
> > @@ -0,0 +1,257 @@
> > +; RUN: opt -S %s -atomic-expand | FileCheck %s
> > +
> > +;;; NOTE: this test is actually target-independent -- any target which
> > +;;; doesn't support inline atomics can be used. (E.g. X86 i386 would
> > +;;; work, if LLVM is properly taught about what it's missing vs i586.)
> > +
> > +;target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128"
> > +;target triple = "i386-unknown-unknown"
> > +target datalayout = "e-m:e-p:32:32-i64:64-f128:64-n32-S64"
> > +target triple = "sparc-unknown-unknown"
> > +
> > +;; First, check the sized calls. Except for cmpxchg, these are fairly
> > +;; straightforward.
> > +
> > +; CHECK-LABEL: @test_load_i16(
> > +; CHECK:  %1 = bitcast i16* %arg to i8*
> > +; CHECK:  %2 = call i16 @__atomic_load_2(i8* %1, i32 5)
> > +; CHECK:  ret i16 %2
> > +define i16 @test_load_i16(i16* %arg) {
> > +  %ret = load atomic i16, i16* %arg seq_cst, align 4
> > +  ret i16 %ret
> > +}
> > +
> > +; CHECK-LABEL: @test_store_i16(
> > +; CHECK:  %1 = bitcast i16* %arg to i8*
> > +; CHECK:  call void @__atomic_store_2(i8* %1, i16 %val, i32 5)
> > +; CHECK:  ret void
> > +define void @test_store_i16(i16* %arg, i16 %val) {
> > +  store atomic i16 %val, i16* %arg seq_cst, align 4
> > +  ret void
> > +}
> > +
> > +; CHECK-LABEL: @test_exchange_i16(
> > +; CHECK:  %1 = bitcast i16* %arg to i8*
> > +; CHECK:  %2 = call i16 @__atomic_exchange_2(i8* %1, i16 %val, i32 5)
> > +; CHECK:  ret i16 %2
> > +define i16 @test_exchange_i16(i16* %arg, i16 %val) {
> > +  %ret = atomicrmw xchg i16* %arg, i16 %val seq_cst
> > +  ret i16 %ret
> > +}
> > +
> > +; CHECK-LABEL: @test_cmpxchg_i16(
> > +; CHECK:  %1 = bitcast i16* %arg to i8*
> > +; CHECK:  %2 = alloca i16, align 2
> > +; CHECK:  %3 = bitcast i16* %2 to i8*
> > +; CHECK:  call void @llvm.lifetime.start(i64 2, i8* %3)
> > +; CHECK:  store i16 %old, i16* %2, align 2
> > +; CHECK:  %4 = call zeroext i1 @__atomic_compare_exchange_2(i8* %1, i8*
> %3, i16 %new, i32 5, i32 0)
> > +; CHECK:  %5 = load i16, i16* %2, align 2
> > +; CHECK:  call void @llvm.lifetime.end(i64 2, i8* %3)
> > +; CHECK:  %6 = insertvalue { i16, i1 } undef, i16 %5, 0
> > +; CHECK:  %7 = insertvalue { i16, i1 } %6, i1 %4, 1
> > +; CHECK:  %ret = extractvalue { i16, i1 } %7, 0
> > +; CHECK:  ret i16 %ret
> > +define i16 @test_cmpxchg_i16(i16* %arg, i16 %old, i16 %new) {
> > +  %ret_succ = cmpxchg i16* %arg, i16 %old, i16 %new seq_cst monotonic
> > +  %ret = extractvalue { i16, i1 } %ret_succ, 0
> > +  ret i16 %ret
> > +}
> > +
> > +; CHECK-LABEL: @test_add_i16(
> > +; CHECK:  %1 = bitcast i16* %arg to i8*
> > +; CHECK:  %2 = call i16 @__atomic_fetch_add_2(i8* %1, i16 %val, i32 5)
> > +; CHECK:  ret i16 %2
> > +define i16 @test_add_i16(i16* %arg, i16 %val) {
> > +  %ret = atomicrmw add i16* %arg, i16 %val seq_cst
> > +  ret i16 %ret
> > +}
> > +
> > +
> > +;; Now, check the output for the unsized libcalls. i128 is used for
> > +;; these tests because the "16" suffixed functions aren't available on
> > +;; 32-bit i386.
> > +
> > +; CHECK-LABEL: @test_load_i128(
> > +; CHECK:  %1 = bitcast i128* %arg to i8*
> > +; CHECK:  %2 = alloca i128, align 8
> > +; CHECK:  %3 = bitcast i128* %2 to i8*
> > +; CHECK:  call void @llvm.lifetime.start(i64 16, i8* %3)
> > +; CHECK:  call void @__atomic_load(i32 16, i8* %1, i8* %3, i32 5)
> > +; CHECK:  %4 = load i128, i128* %2, align 8
> > +; CHECK:  call void @llvm.lifetime.end(i64 16, i8* %3)
> > +; CHECK:  ret i128 %4
> > +define i128 @test_load_i128(i128* %arg) {
> > +  %ret = load atomic i128, i128* %arg seq_cst, align 16
> > +  ret i128 %ret
> > +}
> > +
> > +; CHECK-LABEL @test_store_i128(
> > +; CHECK:  %1 = bitcast i128* %arg to i8*
> > +; CHECK:  %2 = alloca i128, align 8
> > +; CHECK:  %3 = bitcast i128* %2 to i8*
> > +; CHECK:  call void @llvm.lifetime.start(i64 16, i8* %3)
> > +; CHECK:  store i128 %val, i128* %2, align 8
> > +; CHECK:  call void @__atomic_store(i32 16, i8* %1, i8* %3, i32 5)
> > +; CHECK:  call void @llvm.lifetime.end(i64 16, i8* %3)
> > +; CHECK:  ret void
> > +define void @test_store_i128(i128* %arg, i128 %val) {
> > +  store atomic i128 %val, i128* %arg seq_cst, align 16
> > +  ret void
> > +}
> > +
> > +; CHECK-LABEL: @test_exchange_i128(
> > +; CHECK:  %1 = bitcast i128* %arg to i8*
> > +; CHECK:  %2 = alloca i128, align 8
> > +; CHECK:  %3 = bitcast i128* %2 to i8*
> > +; CHECK:  call void @llvm.lifetime.start(i64 16, i8* %3)
> > +; CHECK:  store i128 %val, i128* %2, align 8
> > +; CHECK:  %4 = alloca i128, align 8
> > +; CHECK:  %5 = bitcast i128* %4 to i8*
> > +; CHECK:  call void @llvm.lifetime.start(i64 16, i8* %5)
> > +; CHECK:  call void @__atomic_exchange(i32 16, i8* %1, i8* %3, i8* %5,
> i32 5)
> > +; CHECK:  call void @llvm.lifetime.end(i64 16, i8* %3)
> > +; CHECK:  %6 = load i128, i128* %4, align 8
> > +; CHECK:  call void @llvm.lifetime.end(i64 16, i8* %5)
> > +; CHECK:  ret i128 %6
> > +define i128 @test_exchange_i128(i128* %arg, i128 %val) {
> > +  %ret = atomicrmw xchg i128* %arg, i128 %val seq_cst
> > +  ret i128 %ret
> > +}
> > +
> > +; CHECK-LABEL: @test_cmpxchg_i128(
> > +; CHECK:  %1 = bitcast i128* %arg to i8*
> > +; CHECK:  %2 = alloca i128, align 8
> > +; CHECK:  %3 = bitcast i128* %2 to i8*
> > +; CHECK:  call void @llvm.lifetime.start(i64 16, i8* %3)
> > +; CHECK:  store i128 %old, i128* %2, align 8
> > +; CHECK:  %4 = alloca i128, align 8
> > +; CHECK:  %5 = bitcast i128* %4 to i8*
> > +; CHECK:  call void @llvm.lifetime.start(i64 16, i8* %5)
> > +; CHECK:  store i128 %new, i128* %4, align 8
> > +; CHECK:  %6 = call zeroext i1 @__atomic_compare_exchange(i32 16, i8*
> %1, i8* %3, i8* %5, i32 5, i32 0)
> > +; CHECK:  call void @llvm.lifetime.end(i64 16, i8* %5)
> > +; CHECK:  %7 = load i128, i128* %2, align 8
> > +; CHECK:  call void @llvm.lifetime.end(i64 16, i8* %3)
> > +; CHECK:  %8 = insertvalue { i128, i1 } undef, i128 %7, 0
> > +; CHECK:  %9 = insertvalue { i128, i1 } %8, i1 %6, 1
> > +; CHECK:  %ret = extractvalue { i128, i1 } %9, 0
> > +; CHECK:  ret i128 %ret
> > +define i128 @test_cmpxchg_i128(i128* %arg, i128 %old, i128 %new) {
> > +  %ret_succ = cmpxchg i128* %arg, i128 %old, i128 %new seq_cst monotonic
> > +  %ret = extractvalue { i128, i1 } %ret_succ, 0
> > +  ret i128 %ret
> > +}
> > +
> > +; This one is a verbose expansion, as there is no generic
> > +; __atomic_fetch_add function, so it needs to expand to a cmpxchg
> > +; loop, which then itself expands into a libcall.
> > +
> > +; CHECK-LABEL: @test_add_i128(
> > +; CHECK:  %1 = alloca i128, align 8
> > +; CHECK:  %2 = alloca i128, align 8
> > +; CHECK:  %3 = load i128, i128* %arg, align 16
> > +; CHECK:  br label %atomicrmw.start
> > +; CHECK:atomicrmw.start:
> > +; CHECK:  %loaded = phi i128 [ %3, %0 ], [ %newloaded, %atomicrmw.start
> ]
> > +; CHECK:  %new = add i128 %loaded, %val
> > +; CHECK:  %4 = bitcast i128* %arg to i8*
> > +; CHECK:  %5 = bitcast i128* %1 to i8*
> > +; CHECK:  call void @llvm.lifetime.start(i64 16, i8* %5)
> > +; CHECK:  store i128 %loaded, i128* %1, align 8
> > +; CHECK:  %6 = bitcast i128* %2 to i8*
> > +; CHECK:  call void @llvm.lifetime.start(i64 16, i8* %6)
> > +; CHECK:  store i128 %new, i128* %2, align 8
> > +; CHECK:  %7 = call zeroext i1 @__atomic_compare_exchange(i32 16, i8*
> %4, i8* %5, i8* %6, i32 5, i32 5)
> > +; CHECK:  call void @llvm.lifetime.end(i64 16, i8* %6)
> > +; CHECK:  %8 = load i128, i128* %1, align 8
> > +; CHECK:  call void @llvm.lifetime.end(i64 16, i8* %5)
> > +; CHECK:  %9 = insertvalue { i128, i1 } undef, i128 %8, 0
> > +; CHECK:  %10 = insertvalue { i128, i1 } %9, i1 %7, 1
> > +; CHECK:  %success = extractvalue { i128, i1 } %10, 1
> > +; CHECK:  %newloaded = extractvalue { i128, i1 } %10, 0
> > +; CHECK:  br i1 %success, label %atomicrmw.end, label %atomicrmw.start
> > +; CHECK:atomicrmw.end:
> > +; CHECK:  ret i128 %newloaded
> > +define i128 @test_add_i128(i128* %arg, i128 %val) {
> > +  %ret = atomicrmw add i128* %arg, i128 %val seq_cst
> > +  ret i128 %ret
> > +}
> > +
> > +;; Ensure that non-integer types get bitcast correctly on the way in
> and out of a libcall:
> > +
> > +; CHECK-LABEL: @test_load_double(
> > +; CHECK:  %1 = bitcast double* %arg to i8*
> > +; CHECK:  %2 = call i64 @__atomic_load_8(i8* %1, i32 5)
> > +; CHECK:  %3 = bitcast i64 %2 to double
> > +; CHECK:  ret double %3
> > +define double @test_load_double(double* %arg, double %val) {
> > +  %1 = load atomic double, double* %arg seq_cst, align 16
> > +  ret double %1
> > +}
> > +
> > +; CHECK-LABEL: @test_store_double(
> > +; CHECK:  %1 = bitcast double* %arg to i8*
> > +; CHECK:  %2 = bitcast double %val to i64
> > +; CHECK:  call void @__atomic_store_8(i8* %1, i64 %2, i32 5)
> > +; CHECK:  ret void
> > +define void @test_store_double(double* %arg, double %val) {
> > +  store atomic double %val, double* %arg seq_cst, align 16
> > +  ret void
> > +}
> > +
> > +; CHECK-LABEL: @test_cmpxchg_ptr(
> > +; CHECK:   %1 = bitcast i16** %arg to i8*
> > +; CHECK:   %2 = alloca i16*, align 4
> > +; CHECK:   %3 = bitcast i16** %2 to i8*
> > +; CHECK:   call void @llvm.lifetime.start(i64 4, i8* %3)
> > +; CHECK:   store i16* %old, i16** %2, align 4
> > +; CHECK:   %4 = ptrtoint i16* %new to i32
> > +; CHECK:   %5 = call zeroext i1 @__atomic_compare_exchange_4(i8* %1,
> i8* %3, i32 %4, i32 5, i32 2)
> > +; CHECK:   %6 = load i16*, i16** %2, align 4
> > +; CHECK:   call void @llvm.lifetime.end(i64 4, i8* %3)
> > +; CHECK:   %7 = insertvalue { i16*, i1 } undef, i16* %6, 0
> > +; CHECK:   %8 = insertvalue { i16*, i1 } %7, i1 %5, 1
> > +; CHECK:   %ret = extractvalue { i16*, i1 } %8, 0
> > +; CHECK:   ret i16* %ret
> > +; CHECK: }
> > +define i16* @test_cmpxchg_ptr(i16** %arg, i16* %old, i16* %new) {
> > +  %ret_succ = cmpxchg i16** %arg, i16* %old, i16* %new seq_cst acquire
> > +  %ret = extractvalue { i16*, i1 } %ret_succ, 0
> > +  ret i16* %ret
> > +}
> > +
> > +;; ...and for a non-integer type of large size too.
> > +
> > +; CHECK-LABEL: @test_store_fp128
> > +; CHECK:   %1 = bitcast fp128* %arg to i8*
> > +; CHECK:  %2 = alloca fp128, align 8
> > +; CHECK:  %3 = bitcast fp128* %2 to i8*
> > +; CHECK:  call void @llvm.lifetime.start(i64 16, i8* %3)
> > +; CHECK:  store fp128 %val, fp128* %2, align 8
> > +; CHECK:  call void @__atomic_store(i32 16, i8* %1, i8* %3, i32 5)
> > +; CHECK:  call void @llvm.lifetime.end(i64 16, i8* %3)
> > +; CHECK:  ret void
> > +define void @test_store_fp128(fp128* %arg, fp128 %val) {
> > +  store atomic fp128 %val, fp128* %arg seq_cst, align 16
> > +  ret void
> > +}
> > +
> > +;; Unaligned loads and stores should be expanded to the generic
> > +;; libcall, just like large loads/stores, and not a specialized one.
> > +;; NOTE: atomicrmw and cmpxchg don't yet support an align attribute;
> > +;; when such support is added, they should also be tested here.
> > +
> > +; CHECK-LABEL: @test_unaligned_load_i16(
> > +; CHECK:  __atomic_load(
> > +define i16 @test_unaligned_load_i16(i16* %arg) {
> > +  %ret = load atomic i16, i16* %arg seq_cst, align 1
> > +  ret i16 %ret
> > +}
> > +
> > +; CHECK-LABEL: @test_unaligned_store_i16(
> > +; CHECK: __atomic_store(
> > +define void @test_unaligned_store_i16(i16* %arg, i16 %val) {
> > +  store atomic i16 %val, i16* %arg seq_cst, align 1
> > +  ret void
> > +}
> >
> > Added: llvm/trunk/test/Transforms/AtomicExpand/SPARC/lit.local.cfg
> > URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/AtomicExpand/SPARC/lit.local.cfg?rev=266002&view=auto
> >
> ==============================================================================
> > --- llvm/trunk/test/Transforms/AtomicExpand/SPARC/lit.local.cfg (added)
> > +++ llvm/trunk/test/Transforms/AtomicExpand/SPARC/lit.local.cfg Mon Apr
> 11 17:22:33 2016
> > @@ -0,0 +1,2 @@
> > +if not 'Sparc' in config.root.targets:
> > +  config.unsupported = True
> >
> >
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160412/81cae256/attachment.html>


More information about the llvm-commits mailing list