[clang] [clang][ARM] Fix build failure in <arm_acle.h> for __swp (PR #151354)

Thu Jul 31 10:22:21 PDT 2025

================
@@ -55,11 +55,27 @@ __chkfeat(uint64_t __features) {
 /* 7.5 Swap */
 static __inline__ uint32_t __attribute__((__always_inline__, __nodebug__))
 __swp(uint32_t __x, volatile uint32_t *__p) {
----------------
efriedma-quic wrote:

"relaxed" basically means we don't emit any barriers, so the compiler and CPU can move memory operations to unrelated addresses across it.

In the "modern" case, it might make sense to also use the atomicrmw sequence; it should lower to the same thing, and the compiler understands atomics better than ldrex and strex (and the rules for when ldrex and strex are well-defined are generally weird).

The backend can't select `atomicrmw volatile xchg` to SWP on targets that don't have lock-free atomics: if a target doesn't support cmpxchg for a given width, we have to go through libatomic for all atomics of that width so the locking works consistently.

There are some targets that have lock-free atomics even though they don't have ldrex/strex.  In particular, on armv6 in Thumb mode, we can just switch to arm mode.  And on all Linux targets, the kernel exposes stubs that implement atomic ops.  On those targets, it's probably better to use the __sync_ libcall, not SWP, for the sake of forward-compatibility with systems that don't support SWP.  Performance should be reasonable; it's a libcall, but it doesn't involve any locks or anything like that.

https://github.com/llvm/llvm-project/pull/151354