[clang] [compiler-rt] [libcxx] [Clang] Support atomic operations on _BitInt(N) (PR #204815)
Xavier Roche via cfe-commits
cfe-commits at lists.llvm.org
Sat Jun 27 00:57:42 PDT 2026
https://github.com/xroche updated https://github.com/llvm/llvm-project/pull/204815
>From 5c7fc853303e60e1c39d6ea49d5fa6ad445abc16 Mon Sep 17 00:00:00 2001
From: Xavier Roche <xavier.roche at algolia.com>
Date: Fri, 19 Jun 2026 13:59:45 +0200
Subject: [PATCH 1/9] [Clang][POC] Atomic operations on _BitInt(N)
_BitInt(N) was rejected by every atomic path: the _Atomic(...) type
specifier, the __c11_atomic_*/__atomic_* builtins, and transitively
std::atomic. Two blanket isBitIntType() checks disabled it, dating to the
type's introduction (the __atomic builtin half is D84049). __int128, the
closest analogue, was allowed at both sites.
Lift both rejections so _BitInt flows through the normal integer path.
load, store, exchange, compare-exchange, and bitwise read-modify-write are
then correct at every width through the existing canonicalizing store and
the libcall fallback.
Arithmetic read-modify-write needs more care. A single atomicrmw on the
padded memory integer of a non-byte-aligned width (e.g. _BitInt(37) in an
i64) carries into the padding bits, leaving a non-canonical value that
breaks a later compare-exchange and gives wrong signed min/max. A wide
arithmetic fetch (e.g. _BitInt(256)) hit an llvm_unreachable in the libcall
path, a compiler crash. Both are fixed by emitting a compare-exchange loop
that computes the new value at width N via llvm::buildAtomicRMWValue and
writes back a canonical representation, reusing the existing
EmitAtomicCompareExchange helper, which selects the inline cmpxchg or the
__atomic_compare_exchange libcall by size. No-padding inline widths (64,
128) keep the direct atomicrmw fast path.
The libc++ bit-int.verify.cpp test only asserted that Clang rejection, so it
is removed as obsolete. It is not replaced with a positive test here: the
libc++ premerge matrix builds tests with a pinned clang that predates this
change. Whether libc++ should expose atomic _BitInt is a separate design
question for the P3666R4 discussion.
Verified against gcc-14: identical size and alignment for all widths, and
cross-compiler compare-exchange interop in both directions, confirming the
padding canonicalization matches.
Assisted-by: Claude (Anthropic)
Co-Authored-By: Claude Opus 4.6 <noreply at anthropic.com>
---
clang/docs/LanguageExtensions.rst | 9 +
clang/docs/ReleaseNotes.rst | 6 +
clang/lib/CodeGen/CGAtomic.cpp | 211 ++++++++++++++++++
clang/lib/Sema/SemaChecking.cpp | 5 -
clang/lib/Sema/SemaType.cpp | 2 -
clang/test/CodeGen/atomic-bitint.c | 90 ++++++++
clang/test/Sema/builtins.c | 4 +-
clang/test/SemaCXX/ext-int.cpp | 10 +-
libcxx/test/libcxx/atomics/bit-int.verify.cpp | 22 --
9 files changed, 322 insertions(+), 37 deletions(-)
create mode 100644 clang/test/CodeGen/atomic-bitint.c
delete mode 100644 libcxx/test/libcxx/atomics/bit-int.verify.cpp
diff --git a/clang/docs/LanguageExtensions.rst b/clang/docs/LanguageExtensions.rst
index d79d82a175c68..5ff076d3e48ad 100644
--- a/clang/docs/LanguageExtensions.rst
+++ b/clang/docs/LanguageExtensions.rst
@@ -451,6 +451,15 @@ favor of the standard type.
Note: the ABI for ``_BitInt(N)`` is still in the process of being stabilized,
so this type should not yet be used in interfaces that require ABI stability.
+``_BitInt(N)`` may be used as an atomic type: ``_Atomic(_BitInt(N))``, the
+``__c11_atomic_*`` and ``__atomic_*`` builtins, and ``std::atomic`` all accept
+it for any width. Widths the target cannot operate on inline are lowered to the
+``__atomic_*`` libcalls. For a width whose representation has padding bits (``N``
+not a multiple of the type's alignment, e.g. ``_BitInt(37)``), arithmetic
+read-modify-write operations are emitted as a compare-exchange loop that computes
+at width ``N``, so the result wraps modulo ``2**N`` and the padding bits stay
+canonical.
+
C keywords supported in all language modes
------------------------------------------
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 7f056abfbbe24..8692da8830dff 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -265,6 +265,12 @@ Non-comprehensive list of changes in this release
- Added support for floating point and pointer values in most ``__atomic_``
builtins.
+- Atomic operations on ``_BitInt(N)`` are now supported, including
+ ``_Atomic(_BitInt(N))``, the ``__c11_atomic_*`` / ``__atomic_*`` builtins, and
+ ``std::atomic``. Widths the target cannot operate on inline use the
+ ``__atomic_*`` libcalls; arithmetic read-modify-write on a width with padding
+ bits is emitted as a compare-exchange loop computing at the value width.
+
- Added ``__builtin_stdc_rotate_left`` and ``__builtin_stdc_rotate_right``
for bit rotation of unsigned integers including ``_BitInt`` types. Rotation
counts are normalized modulo the bit-width and support negative values.
diff --git a/clang/lib/CodeGen/CGAtomic.cpp b/clang/lib/CodeGen/CGAtomic.cpp
index 270965b109943..66c059fd40e26 100644
--- a/clang/lib/CodeGen/CGAtomic.cpp
+++ b/clang/lib/CodeGen/CGAtomic.cpp
@@ -21,6 +21,7 @@
#include "llvm/ADT/DenseMap.h"
#include "llvm/IR/DataLayout.h"
#include "llvm/IR/Intrinsics.h"
+#include "llvm/Transforms/Utils/LowerAtomic.h"
using namespace clang;
using namespace CodeGen;
@@ -558,6 +559,195 @@ static llvm::Value *EmitPostAtomicMinMax(CGBuilderTy &Builder,
return Builder.CreateSelect(Cmp, OldVal, RHS, "newval");
}
+/// Classify an atomic op as an arithmetic/bitwise read-modify-write (one that
+/// normally lowers to a single `atomicrmw`), mapping it to the matching
+/// `AtomicRMWInst::BinOp` and reporting whether the builtin returns the new
+/// value (`<op>_fetch`) rather than the old value (`fetch_<op>`). \p IsSigned
+/// selects signed vs unsigned min/max. Returns false for exchange, load, store,
+/// compare-exchange, and any non-RMW op, none of which need the _BitInt loop.
+static bool classifyBitIntRMW(AtomicExpr::AtomicOp Op, bool IsSigned,
+ llvm::AtomicRMWInst::BinOp &BinOp,
+ bool &ReturnsNew) {
+ using RMW = llvm::AtomicRMWInst;
+ switch (Op) {
+ case AtomicExpr::AO__c11_atomic_fetch_add:
+ case AtomicExpr::AO__hip_atomic_fetch_add:
+ case AtomicExpr::AO__opencl_atomic_fetch_add:
+ case AtomicExpr::AO__atomic_fetch_add:
+ case AtomicExpr::AO__scoped_atomic_fetch_add:
+ BinOp = RMW::Add, ReturnsNew = false;
+ return true;
+ case AtomicExpr::AO__atomic_add_fetch:
+ case AtomicExpr::AO__scoped_atomic_add_fetch:
+ BinOp = RMW::Add, ReturnsNew = true;
+ return true;
+ case AtomicExpr::AO__c11_atomic_fetch_sub:
+ case AtomicExpr::AO__hip_atomic_fetch_sub:
+ case AtomicExpr::AO__opencl_atomic_fetch_sub:
+ case AtomicExpr::AO__atomic_fetch_sub:
+ case AtomicExpr::AO__scoped_atomic_fetch_sub:
+ BinOp = RMW::Sub, ReturnsNew = false;
+ return true;
+ case AtomicExpr::AO__atomic_sub_fetch:
+ case AtomicExpr::AO__scoped_atomic_sub_fetch:
+ BinOp = RMW::Sub, ReturnsNew = true;
+ return true;
+ case AtomicExpr::AO__c11_atomic_fetch_and:
+ case AtomicExpr::AO__hip_atomic_fetch_and:
+ case AtomicExpr::AO__opencl_atomic_fetch_and:
+ case AtomicExpr::AO__atomic_fetch_and:
+ case AtomicExpr::AO__scoped_atomic_fetch_and:
+ BinOp = RMW::And, ReturnsNew = false;
+ return true;
+ case AtomicExpr::AO__atomic_and_fetch:
+ case AtomicExpr::AO__scoped_atomic_and_fetch:
+ BinOp = RMW::And, ReturnsNew = true;
+ return true;
+ case AtomicExpr::AO__c11_atomic_fetch_or:
+ case AtomicExpr::AO__hip_atomic_fetch_or:
+ case AtomicExpr::AO__opencl_atomic_fetch_or:
+ case AtomicExpr::AO__atomic_fetch_or:
+ case AtomicExpr::AO__scoped_atomic_fetch_or:
+ BinOp = RMW::Or, ReturnsNew = false;
+ return true;
+ case AtomicExpr::AO__atomic_or_fetch:
+ case AtomicExpr::AO__scoped_atomic_or_fetch:
+ BinOp = RMW::Or, ReturnsNew = true;
+ return true;
+ case AtomicExpr::AO__c11_atomic_fetch_xor:
+ case AtomicExpr::AO__hip_atomic_fetch_xor:
+ case AtomicExpr::AO__opencl_atomic_fetch_xor:
+ case AtomicExpr::AO__atomic_fetch_xor:
+ case AtomicExpr::AO__scoped_atomic_fetch_xor:
+ BinOp = RMW::Xor, ReturnsNew = false;
+ return true;
+ case AtomicExpr::AO__atomic_xor_fetch:
+ case AtomicExpr::AO__scoped_atomic_xor_fetch:
+ BinOp = RMW::Xor, ReturnsNew = true;
+ return true;
+ case AtomicExpr::AO__c11_atomic_fetch_nand:
+ case AtomicExpr::AO__atomic_fetch_nand:
+ case AtomicExpr::AO__scoped_atomic_fetch_nand:
+ BinOp = RMW::Nand, ReturnsNew = false;
+ return true;
+ case AtomicExpr::AO__atomic_nand_fetch:
+ case AtomicExpr::AO__scoped_atomic_nand_fetch:
+ BinOp = RMW::Nand, ReturnsNew = true;
+ return true;
+ case AtomicExpr::AO__c11_atomic_fetch_min:
+ case AtomicExpr::AO__hip_atomic_fetch_min:
+ case AtomicExpr::AO__opencl_atomic_fetch_min:
+ case AtomicExpr::AO__atomic_fetch_min:
+ case AtomicExpr::AO__scoped_atomic_fetch_min:
+ BinOp = IsSigned ? RMW::Min : RMW::UMin, ReturnsNew = false;
+ return true;
+ case AtomicExpr::AO__atomic_min_fetch:
+ case AtomicExpr::AO__scoped_atomic_min_fetch:
+ BinOp = IsSigned ? RMW::Min : RMW::UMin, ReturnsNew = true;
+ return true;
+ case AtomicExpr::AO__c11_atomic_fetch_max:
+ case AtomicExpr::AO__hip_atomic_fetch_max:
+ case AtomicExpr::AO__opencl_atomic_fetch_max:
+ case AtomicExpr::AO__atomic_fetch_max:
+ case AtomicExpr::AO__scoped_atomic_fetch_max:
+ BinOp = IsSigned ? RMW::Max : RMW::UMax, ReturnsNew = false;
+ return true;
+ case AtomicExpr::AO__atomic_max_fetch:
+ case AtomicExpr::AO__scoped_atomic_max_fetch:
+ BinOp = IsSigned ? RMW::Max : RMW::UMax, ReturnsNew = true;
+ return true;
+ case AtomicExpr::AO__atomic_fetch_uinc:
+ case AtomicExpr::AO__scoped_atomic_fetch_uinc:
+ BinOp = RMW::UIncWrap, ReturnsNew = false;
+ return true;
+ case AtomicExpr::AO__atomic_fetch_udec:
+ case AtomicExpr::AO__scoped_atomic_fetch_udec:
+ BinOp = RMW::UDecWrap, ReturnsNew = false;
+ return true;
+ default:
+ return false;
+ }
+}
+
+/// True for a `_BitInt(N)` whose value width N differs from its in-memory width
+/// (e.g. `_BitInt(37)` occupies 64 bits), so the high bits are padding.
+static bool hasBitIntPadding(QualType T, const ASTContext &C) {
+ if (const auto *BIT = T->getAs<BitIntType>())
+ return BIT->getNumBits() != C.getTypeSize(T);
+ return false;
+}
+
+/// Map a constant C ABI memory order to an llvm ordering. A non-constant order
+/// is handled conservatively with the strongest ordering.
+static llvm::AtomicOrdering atomicOrderOrSeqCst(llvm::Value *Order) {
+ auto *C = dyn_cast<llvm::ConstantInt>(Order);
+ if (!C || !llvm::isValidAtomicOrderingCABI(C->getZExtValue()))
+ return llvm::AtomicOrdering::SequentiallyConsistent;
+ switch ((llvm::AtomicOrderingCABI)C->getZExtValue()) {
+ case llvm::AtomicOrderingCABI::relaxed:
+ return llvm::AtomicOrdering::Monotonic;
+ case llvm::AtomicOrderingCABI::consume:
+ case llvm::AtomicOrderingCABI::acquire:
+ return llvm::AtomicOrdering::Acquire;
+ case llvm::AtomicOrderingCABI::release:
+ return llvm::AtomicOrdering::Release;
+ case llvm::AtomicOrderingCABI::acq_rel:
+ return llvm::AtomicOrdering::AcquireRelease;
+ case llvm::AtomicOrderingCABI::seq_cst:
+ return llvm::AtomicOrdering::SequentiallyConsistent;
+ }
+ llvm_unreachable("invalid CABI ordering");
+}
+
+/// Emit a `_BitInt(N)` atomic read-modify-write as a compare-exchange loop. A
+/// single `atomicrmw` on the padded memory integer would carry into / compare
+/// the padding bits, and no arbitrary-width `__atomic_fetch_*` libcall exists
+/// for wide widths. The loop computes the new value at width N and writes back
+/// a canonical (extended) representation via the existing cmpxchg helper, which
+/// also picks the inline-vs-libcall form by size.
+static RValue emitBitIntAtomicRMWLoop(CodeGenFunction &CGF, AtomicExpr *E,
+ Address Ptr, Address Val1,
+ QualType AtomicTy,
+ llvm::AtomicRMWInst::BinOp BinOp,
+ bool ReturnsNew, llvm::Value *Order) {
+ QualType ValTy = E->getValueType();
+ llvm::AtomicOrdering AO = atomicOrderOrSeqCst(Order);
+ llvm::AtomicOrdering Failure =
+ llvm::AtomicCmpXchgInst::getStrongestFailureOrdering(AO);
+
+ LValue AtomicLVal = CGF.MakeAddrLValue(Ptr, AtomicTy);
+ AtomicInfo Atomics(CGF, AtomicLVal);
+
+ llvm::Value *RHS =
+ CGF.EmitLoadOfScalar(CGF.MakeAddrLValue(Val1, ValTy), E->getExprLoc());
+
+ RValue OldRV = Atomics.EmitAtomicLoad(
+ AggValueSlot::ignored(), E->getExprLoc(),
+ /*AsValue=*/true, llvm::AtomicOrdering::Monotonic, E->isVolatile());
+ llvm::Value *Init = OldRV.getScalarVal();
+
+ llvm::BasicBlock *StartBB = CGF.Builder.GetInsertBlock();
+ llvm::BasicBlock *LoopBB = CGF.createBasicBlock("atomicrmw.start", CGF.CurFn);
+ llvm::BasicBlock *EndBB = CGF.createBasicBlock("atomicrmw.end", CGF.CurFn);
+ CGF.Builder.CreateBr(LoopBB);
+ CGF.Builder.SetInsertPoint(LoopBB);
+
+ llvm::PHINode *Old = CGF.Builder.CreatePHI(Init->getType(), 2);
+ Old->addIncoming(Init, StartBB);
+
+ // Compute at the value width via the canonical RMW lowering, so the result
+ // wraps mod 2^N and never touches the padding bits.
+ llvm::Value *New = llvm::buildAtomicRMWValue(BinOp, CGF.Builder, Old, RHS);
+
+ auto Res = Atomics.EmitAtomicCompareExchange(
+ RValue::get(Old), RValue::get(New), AO, Failure, /*IsWeak=*/true);
+ Old->addIncoming(Res.first.getScalarVal(), CGF.Builder.GetInsertBlock());
+ CGF.Builder.CreateCondBr(Res.second, EndBB, LoopBB);
+
+ CGF.Builder.SetInsertPoint(EndBB);
+ return RValue::get(ReturnsNew ? New : static_cast<llvm::Value *>(Old));
+}
+
static void EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *E, Address Dest,
Address Ptr, Address Val1, Address Val2,
Address ExpectedResult, llvm::Value *IsWeak,
@@ -1109,6 +1299,27 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E) {
LValue AtomicVal = MakeAddrLValue(Ptr, AtomicTy);
AtomicInfo Atomics(*this, AtomicVal);
+ // A `_BitInt(N)` read-modify-write whose value width has padding bits, or
+ // whose size forces a libcall, cannot use a single atomicrmw: the op would
+ // carry into / compare the padding bits, and no arbitrary-width
+ // __atomic_fetch_* libcall exists. Emit a compare-exchange loop instead.
+ // Bitwise and/or/xor are exact even with padding, so only the wide case needs
+ // the loop for them. load/store/exchange/compare_exchange keep their paths.
+ if (MemTy->isBitIntType()) {
+ llvm::AtomicRMWInst::BinOp BinOp;
+ bool RMWReturnsNew;
+ if (classifyBitIntRMW(E->getOp(), MemTy->isSignedIntegerType(), BinOp,
+ RMWReturnsNew)) {
+ bool WideOrNonPow2 = (Size & (Size - 1)) != 0 || Size > 16;
+ bool Bitwise = BinOp == llvm::AtomicRMWInst::And ||
+ BinOp == llvm::AtomicRMWInst::Or ||
+ BinOp == llvm::AtomicRMWInst::Xor;
+ if (WideOrNonPow2 || (hasBitIntPadding(MemTy, getContext()) && !Bitwise))
+ return emitBitIntAtomicRMWLoop(*this, E, Ptr, Val1, AtomicTy, BinOp,
+ RMWReturnsNew, Order);
+ }
+ }
+
Address OriginalVal1 = Val1;
if (ShouldCastToIntPtrTy) {
Ptr = Atomics.castToAtomicIntPointer(Ptr);
diff --git a/clang/lib/Sema/SemaChecking.cpp b/clang/lib/Sema/SemaChecking.cpp
index b8a3f48a32f24..874ce2bf1ce3a 100644
--- a/clang/lib/Sema/SemaChecking.cpp
+++ b/clang/lib/Sema/SemaChecking.cpp
@@ -5460,11 +5460,6 @@ ExprResult Sema::BuildAtomicExpr(SourceRange CallRange, SourceRange ExprRange,
? 0
: 1);
- if (ValType->isBitIntType()) {
- Diag(Ptr->getExprLoc(), diag::err_atomic_builtin_bit_int_prohibit);
- return ExprError();
- }
-
return AE;
}
diff --git a/clang/lib/Sema/SemaType.cpp b/clang/lib/Sema/SemaType.cpp
index d2bb312feadc1..4a3506c281acf 100644
--- a/clang/lib/Sema/SemaType.cpp
+++ b/clang/lib/Sema/SemaType.cpp
@@ -10412,8 +10412,6 @@ QualType Sema::BuildAtomicType(QualType T, SourceLocation Loc) {
else if (!T.isTriviallyCopyableType(Context) && getLangOpts().CPlusPlus)
// Some other non-trivially-copyable type (probably a C++ class)
DisallowedKind = 7;
- else if (T->isBitIntType())
- DisallowedKind = 8;
else if (getLangOpts().C23 && T->isUndeducedAutoType())
// _Atomic auto is prohibited in C23
DisallowedKind = 9;
diff --git a/clang/test/CodeGen/atomic-bitint.c b/clang/test/CodeGen/atomic-bitint.c
new file mode 100644
index 0000000000000..358b530e8a792
--- /dev/null
+++ b/clang/test/CodeGen/atomic-bitint.c
@@ -0,0 +1,90 @@
+// RUN: %clang_cc1 -std=c23 -triple x86_64-unknown-linux-gnu -emit-llvm %s -o - | FileCheck %s
+//
+// Atomic operations on _BitInt(N). load/store/exchange/compare-exchange and
+// bitwise RMW lower directly; arithmetic RMW on a padded width and any RMW on a
+// width too wide for an inline atomicrmw lower to a compare-exchange loop that
+// computes at the value width.
+
+typedef _BitInt(37) S37;
+typedef unsigned _BitInt(37) U37;
+typedef _BitInt(64) S64;
+typedef _BitInt(128) S128;
+typedef _BitInt(256) S256;
+
+// CHECK-LABEL: @ld37(
+// CHECK: load atomic i64
+S37 ld37(_Atomic(S37) *p) { return __c11_atomic_load(p, __ATOMIC_SEQ_CST); }
+
+// CHECK-LABEL: @st37(
+// CHECK: store atomic i64
+void st37(_Atomic(S37) *p, S37 v) { __c11_atomic_store(p, v, __ATOMIC_SEQ_CST); }
+
+// CHECK-LABEL: @xchg37(
+// CHECK: atomicrmw xchg ptr {{.*}} i64
+S37 xchg37(_Atomic(S37) *p, S37 v) {
+ return __c11_atomic_exchange(p, v, __ATOMIC_SEQ_CST);
+}
+
+// CHECK-LABEL: @cas37(
+// CHECK: cmpxchg ptr {{.*}} i64
+_Bool cas37(_Atomic(S37) *p, S37 *e, S37 d) {
+ return __c11_atomic_compare_exchange_strong(p, e, d, __ATOMIC_SEQ_CST,
+ __ATOMIC_SEQ_CST);
+}
+
+// Bitwise RMW on a padded width keeps the direct atomicrmw: it is exact.
+// CHECK-LABEL: @and37(
+// CHECK: atomicrmw and ptr {{.*}} i64
+// CHECK-NOT: cmpxchg
+S37 and37(_Atomic(S37) *p, S37 v) {
+ return __c11_atomic_fetch_and(p, v, __ATOMIC_SEQ_CST);
+}
+
+// Arithmetic RMW on a padded width becomes a compare-exchange loop, not a bare
+// atomicrmw that would carry into the padding bits.
+// CHECK-LABEL: @add37(
+// CHECK: atomicrmw.start:
+// CHECK: cmpxchg weak ptr {{.*}} i64
+// CHECK-NOT: atomicrmw add
+S37 add37(_Atomic(S37) *p, S37 v) {
+ return __c11_atomic_fetch_add(p, v, __ATOMIC_SEQ_CST);
+}
+
+// Signed min is computed at the value width, so the sign bit is at bit N-1.
+// CHECK-LABEL: @min37(
+// CHECK: icmp sle i37
+// CHECK: select i1
+// CHECK: cmpxchg weak ptr {{.*}} i64
+U37 min37(_Atomic(S37) *p, S37 v) {
+ return __c11_atomic_fetch_min(p, v, __ATOMIC_SEQ_CST);
+}
+
+// No padding: direct atomicrmw, no loop.
+// CHECK-LABEL: @add64(
+// CHECK: atomicrmw add ptr {{.*}} i64
+// CHECK-NOT: cmpxchg
+S64 add64(_Atomic(S64) *p, S64 v) {
+ return __c11_atomic_fetch_add(p, v, __ATOMIC_SEQ_CST);
+}
+
+// CHECK-LABEL: @add128(
+// CHECK: atomicrmw add ptr {{.*}} i128
+S128 add128(_Atomic(S128) *p, S128 v) {
+ return __c11_atomic_fetch_add(p, v, __ATOMIC_SEQ_CST);
+}
+
+// Wide: no inline atomicrmw and no arbitrary-width __atomic_fetch_add libcall,
+// so the loop calls __atomic_compare_exchange.
+// CHECK-LABEL: @add256(
+// CHECK: call {{.*}}@__atomic_compare_exchange
+// CHECK-NOT: cmpxchg
+S256 add256(_Atomic(S256) *p, S256 v) {
+ return __c11_atomic_fetch_add(p, v, __ATOMIC_SEQ_CST);
+}
+
+// Wide bitwise also needs the loop: the wide path has no inline atomicrmw.
+// CHECK-LABEL: @or256(
+// CHECK: call {{.*}}@__atomic_compare_exchange
+S256 or256(_Atomic(S256) *p, S256 v) {
+ return __c11_atomic_fetch_or(p, v, __ATOMIC_SEQ_CST);
+}
diff --git a/clang/test/Sema/builtins.c b/clang/test/Sema/builtins.c
index b669ee68cdd95..57e0eefdb772b 100644
--- a/clang/test/Sema/builtins.c
+++ b/clang/test/Sema/builtins.c
@@ -281,7 +281,7 @@ void test_ei_i42i(_BitInt(42) *ptr, int value) {
// expected-warning at +1 {{the semantics of this intrinsic changed with GCC version 4.4 - the newer semantics are provided here}}
__sync_nand_and_fetch(ptr, value); // expected-error {{atomic memory operand must have a power-of-two size}}
- __atomic_fetch_add(ptr, 1, 0); // expected-error {{argument to atomic builtin of type '_BitInt' is not supported}}
+ __atomic_fetch_add(ptr, 1, 0); // expect success: the GNU atomic builtins support _BitInt
}
void test_ei_i64i(_BitInt(64) *ptr, int value) {
@@ -289,7 +289,7 @@ void test_ei_i64i(_BitInt(64) *ptr, int value) {
// expected-warning at +1 {{the semantics of this intrinsic changed with GCC version 4.4 - the newer semantics are provided here}}
__sync_nand_and_fetch(ptr, value); // expect success
- __atomic_fetch_add(ptr, 1, 0); // expected-error {{argument to atomic builtin of type '_BitInt' is not supported}}
+ __atomic_fetch_add(ptr, 1, 0); // expect success
}
void test_ei_ii42(int *ptr, _BitInt(42) value) {
diff --git a/clang/test/SemaCXX/ext-int.cpp b/clang/test/SemaCXX/ext-int.cpp
index 281ae3d3c1779..f62a07a84200e 100644
--- a/clang/test/SemaCXX/ext-int.cpp
+++ b/clang/test/SemaCXX/ext-int.cpp
@@ -121,13 +121,11 @@ _Complex _BitInt(3) Cmplx;
// expected-error at +1{{'_Complex _BitInt' is invalid}}
typedef _Complex _BitInt(3) Cmp;
-// Reject cases of _Atomic:
-// expected-error at +1{{_Atomic cannot be applied to integer type '_BitInt(4)'}}
-_Atomic _BitInt(4) TooSmallAtomic;
-// expected-error at +1{{_Atomic cannot be applied to integer type '_BitInt(9)'}}
+// _Atomic accepts any _BitInt width: small and non-power-of-2 included.
+// Sizes the target cannot lower inline use the __atomic_* libcalls.
+_Atomic _BitInt(4) SmallAtomic;
_Atomic _BitInt(9) NotPow2Atomic;
-// expected-error at +1{{_Atomic cannot be applied to integer type '_BitInt(128)'}}
-_Atomic _BitInt(128) JustRightAtomic;
+_Atomic _BitInt(128) WideAtomic;
// Test result types of Unary/Bitwise/Binary Operations:
void Ops() {
diff --git a/libcxx/test/libcxx/atomics/bit-int.verify.cpp b/libcxx/test/libcxx/atomics/bit-int.verify.cpp
deleted file mode 100644
index 03880a1b6215c..0000000000000
--- a/libcxx/test/libcxx/atomics/bit-int.verify.cpp
+++ /dev/null
@@ -1,22 +0,0 @@
-//===----------------------------------------------------------------------===//
-//
-// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-//
-//===----------------------------------------------------------------------===//
-
-// <atomic>
-
-// Make sure that `std::atomic` doesn't work with `_BitInt`. The intent is to
-// disable them for now until their behavior can be designed better later.
-// See https://reviews.llvm.org/D84049 for details.
-
-// UNSUPPORTED: c++03
-
-#include <atomic>
-
-void f() {
- // expected-error@*:*1 {{_Atomic cannot be applied to integer type '_BitInt(32)'}}
- std::atomic<_BitInt(32)> x(42);
-}
>From 92e21eaa61bf0f26c3ba825545845f531d79c4d6 Mon Sep 17 00:00:00 2001
From: Xavier Roche <xavier.roche at algolia.com>
Date: Tue, 23 Jun 2026 11:53:49 +0200
Subject: [PATCH 2/9] [Clang][POC] Add C23/C2y Sema tests for atomic _BitInt(N)
C23 requires the type-generic atomic interfaces to accept _BitInt(N), so
_Atomic(_BitInt(N)) is well-formed at every width. Add a Sema acceptance
test covering the _Atomic specifier and the __c11_atomic_*/__atomic_*
builtins in C23 and C2y modes, and a -std=c2y run of the codegen test.
Assisted-by: Claude (Anthropic)
Co-Authored-By: Claude Opus 4.6 <noreply at anthropic.com>
---
clang/test/CodeGen/atomic-bitint.c | 1 +
clang/test/Sema/atomic-bitint.c | 40 ++++++++++++++++++++++++++++++
2 files changed, 41 insertions(+)
create mode 100644 clang/test/Sema/atomic-bitint.c
diff --git a/clang/test/CodeGen/atomic-bitint.c b/clang/test/CodeGen/atomic-bitint.c
index 358b530e8a792..9fa259776bf62 100644
--- a/clang/test/CodeGen/atomic-bitint.c
+++ b/clang/test/CodeGen/atomic-bitint.c
@@ -1,4 +1,5 @@
// RUN: %clang_cc1 -std=c23 -triple x86_64-unknown-linux-gnu -emit-llvm %s -o - | FileCheck %s
+// RUN: %clang_cc1 -std=c2y -triple x86_64-unknown-linux-gnu -emit-llvm %s -o - | FileCheck %s
//
// Atomic operations on _BitInt(N). load/store/exchange/compare-exchange and
// bitwise RMW lower directly; arithmetic RMW on a padded width and any RMW on a
diff --git a/clang/test/Sema/atomic-bitint.c b/clang/test/Sema/atomic-bitint.c
new file mode 100644
index 0000000000000..1466412bae732
--- /dev/null
+++ b/clang/test/Sema/atomic-bitint.c
@@ -0,0 +1,40 @@
+// RUN: %clang_cc1 %s -fsyntax-only -verify -triple x86_64-unknown-linux-gnu -std=c23
+// RUN: %clang_cc1 %s -fsyntax-only -verify -triple x86_64-unknown-linux-gnu -std=c2y
+//
+// C23 requires the type-generic atomic interfaces to accept _BitInt(N) for
+// every N, so _Atomic(_BitInt(N)) is well-formed at every width. Widths past
+// 128 are x86-only.
+
+// expected-no-diagnostics
+
+_Atomic(_BitInt(4)) a4; // small
+_Atomic(_BitInt(9)) a9; // non-power-of-two
+_Atomic(_BitInt(37)) a37; // padded
+_Atomic(_BitInt(64)) a64;
+_Atomic(_BitInt(128)) a128;
+_Atomic(_BitInt(256)) a256; // wider than any inline atomic
+
+// The _Atomic qualifier spelling is equally valid.
+_Atomic _BitInt(9) q9;
+
+static_assert(sizeof(_Atomic(_BitInt(37))) == 8);
+static_assert(sizeof(_Atomic(_BitInt(128))) == 16);
+static_assert(sizeof(_Atomic(_BitInt(256))) == 32);
+
+void c11_builtins(_Atomic(_BitInt(37)) *p, _BitInt(37) v, _BitInt(37) *e) {
+ (void)__c11_atomic_load(p, __ATOMIC_SEQ_CST);
+ __c11_atomic_store(p, v, __ATOMIC_SEQ_CST);
+ (void)__c11_atomic_exchange(p, v, __ATOMIC_SEQ_CST);
+ (void)__c11_atomic_compare_exchange_strong(p, e, v, __ATOMIC_SEQ_CST,
+ __ATOMIC_SEQ_CST);
+ (void)__c11_atomic_fetch_add(p, v, __ATOMIC_SEQ_CST);
+ (void)__c11_atomic_fetch_and(p, v, __ATOMIC_SEQ_CST);
+ (void)__c11_atomic_fetch_min(p, v, __ATOMIC_SEQ_CST);
+}
+
+// The GNU __atomic_* builtins take a plain _BitInt pointer.
+void gnu_builtins(_BitInt(37) *p, _BitInt(37) v) {
+ (void)__atomic_load_n(p, __ATOMIC_SEQ_CST);
+ __atomic_store_n(p, v, __ATOMIC_SEQ_CST);
+ (void)__atomic_fetch_add(p, v, __ATOMIC_SEQ_CST);
+}
>From c0e011fb61086a5a0cae13c54abe21ab312e85c7 Mon Sep 17 00:00:00 2001
From: Xavier Roche <xavier.roche at algolia.com>
Date: Tue, 23 Jun 2026 13:51:48 +0200
Subject: [PATCH 3/9] [Clang][POC] Extend atomic _BitInt Sema test: width 4096
+ RISC-V
Add a _BitInt(4096) acceptance case and a riscv64 RUN line. The atomic
code imposes no width cap of its own, so the only limit is the target's
getMaxBitIntWidth(); x86 and RISC-V allow widths past 128, others cap at
128. Correct the comment that claimed wide widths were x86-only.
Assisted-by: Claude (Anthropic)
Co-Authored-By: Claude Opus 4.6 <noreply at anthropic.com>
---
clang/test/Sema/atomic-bitint.c | 19 +++++++++++--------
1 file changed, 11 insertions(+), 8 deletions(-)
diff --git a/clang/test/Sema/atomic-bitint.c b/clang/test/Sema/atomic-bitint.c
index 1466412bae732..fbb4c518438fb 100644
--- a/clang/test/Sema/atomic-bitint.c
+++ b/clang/test/Sema/atomic-bitint.c
@@ -1,18 +1,21 @@
// RUN: %clang_cc1 %s -fsyntax-only -verify -triple x86_64-unknown-linux-gnu -std=c23
// RUN: %clang_cc1 %s -fsyntax-only -verify -triple x86_64-unknown-linux-gnu -std=c2y
+// RUN: %clang_cc1 %s -fsyntax-only -verify -triple riscv64-unknown-linux-gnu -std=c23
//
// C23 requires the type-generic atomic interfaces to accept _BitInt(N) for
-// every N, so _Atomic(_BitInt(N)) is well-formed at every width. Widths past
-// 128 are x86-only.
+// every N, so _Atomic(_BitInt(N)) is well-formed at every width. The atomic
+// code imposes no width cap of its own; widths past 128 are available wherever
+// the target accepts _BitInt > 128 (x86 and RISC-V today).
// expected-no-diagnostics
-_Atomic(_BitInt(4)) a4; // small
-_Atomic(_BitInt(9)) a9; // non-power-of-two
-_Atomic(_BitInt(37)) a37; // padded
-_Atomic(_BitInt(64)) a64;
-_Atomic(_BitInt(128)) a128;
-_Atomic(_BitInt(256)) a256; // wider than any inline atomic
+_Atomic(_BitInt(4)) a4; // small
+_Atomic(_BitInt(9)) a9; // non-power-of-two
+_Atomic(_BitInt(37)) a37; // padded
+_Atomic(_BitInt(64)) a64;
+_Atomic(_BitInt(128)) a128;
+_Atomic(_BitInt(256)) a256; // wider than any inline atomic
+_Atomic(_BitInt(4096)) a4096; // far past the inline range
// The _Atomic qualifier spelling is equally valid.
_Atomic _BitInt(9) q9;
>From 92e2068b4837befaa51c040c0c882ea7f10cd57f Mon Sep 17 00:00:00 2001
From: Xavier Roche <xavier.roche at algolia.com>
Date: Fri, 26 Jun 2026 21:21:00 +0200
Subject: [PATCH 4/9] [Clang][NFC] Address review: atomic _BitInt diagnostic
list and cast
Drop the now-unreachable _BitInt ("integer") option from
err_atomic_specifier_bad_type and renumber the _Atomic-auto/C23 case to
fill the gap; the rendered text is unchanged. Use static_cast instead of
a C-style cast in atomicOrderOrSeqCst.
Assisted-by: Claude (Anthropic)
Co-Authored-By: Claude Opus 4.6 <noreply at anthropic.com>
---
clang/include/clang/Basic/DiagnosticSemaKinds.td | 4 ++--
clang/lib/CodeGen/CGAtomic.cpp | 2 +-
clang/lib/Sema/SemaType.cpp | 2 +-
3 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index cde99dfb16ec5..414357c5a7c73 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -7475,8 +7475,8 @@ def err_func_def_incomplete_result : Error<
def err_atomic_specifier_bad_type
: Error<"_Atomic cannot be applied to "
"%select{incomplete |array |function |reference |atomic |qualified "
- "|sizeless ||integer |}0type "
- "%1 %select{|||||||which is not trivially copyable||in C23}0">;
+ "|sizeless ||}0type "
+ "%1 %select{|||||||which is not trivially copyable|in C23}0">;
def warn_atomic_member_access : Warning<
"accessing a member of an atomic structure or union is undefined behavior">,
InGroup<DiagGroup<"atomic-access">>, DefaultError;
diff --git a/clang/lib/CodeGen/CGAtomic.cpp b/clang/lib/CodeGen/CGAtomic.cpp
index 66c059fd40e26..820849f5974c0 100644
--- a/clang/lib/CodeGen/CGAtomic.cpp
+++ b/clang/lib/CodeGen/CGAtomic.cpp
@@ -683,7 +683,7 @@ static llvm::AtomicOrdering atomicOrderOrSeqCst(llvm::Value *Order) {
auto *C = dyn_cast<llvm::ConstantInt>(Order);
if (!C || !llvm::isValidAtomicOrderingCABI(C->getZExtValue()))
return llvm::AtomicOrdering::SequentiallyConsistent;
- switch ((llvm::AtomicOrderingCABI)C->getZExtValue()) {
+ switch (static_cast<llvm::AtomicOrderingCABI>(C->getZExtValue())) {
case llvm::AtomicOrderingCABI::relaxed:
return llvm::AtomicOrdering::Monotonic;
case llvm::AtomicOrderingCABI::consume:
diff --git a/clang/lib/Sema/SemaType.cpp b/clang/lib/Sema/SemaType.cpp
index 4a3506c281acf..f76244d5f2871 100644
--- a/clang/lib/Sema/SemaType.cpp
+++ b/clang/lib/Sema/SemaType.cpp
@@ -10414,7 +10414,7 @@ QualType Sema::BuildAtomicType(QualType T, SourceLocation Loc) {
DisallowedKind = 7;
else if (getLangOpts().C23 && T->isUndeducedAutoType())
// _Atomic auto is prohibited in C23
- DisallowedKind = 9;
+ DisallowedKind = 8;
if (DisallowedKind != -1) {
Diag(Loc, diag::err_atomic_specifier_bad_type) << DisallowedKind << T;
>From ece57436ed0217f4c1ad4ca92f10182793ec109f Mon Sep 17 00:00:00 2001
From: Xavier Roche <xavier.roche at algolia.com>
Date: Fri, 26 Jun 2026 21:21:02 +0200
Subject: [PATCH 5/9] [Clang][NFC] Regenerate atomic-bitint.c checks with
update_cc_test_checks
The prior spot-checks did not show the lowering. Regenerate complete
check lines so the compare-exchange loop, the width-N arithmetic, and the
sext/zext memory canonicalization are all visible.
Assisted-by: Claude (Anthropic)
Co-Authored-By: Claude Opus 4.6 <noreply at anthropic.com>
---
clang/test/CodeGen/atomic-bitint.c | 350 ++++++++++++++++++++++++++---
1 file changed, 321 insertions(+), 29 deletions(-)
diff --git a/clang/test/CodeGen/atomic-bitint.c b/clang/test/CodeGen/atomic-bitint.c
index 9fa259776bf62..6476c26a0f0dd 100644
--- a/clang/test/CodeGen/atomic-bitint.c
+++ b/clang/test/CodeGen/atomic-bitint.c
@@ -1,3 +1,4 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 6
// RUN: %clang_cc1 -std=c23 -triple x86_64-unknown-linux-gnu -emit-llvm %s -o - | FileCheck %s
// RUN: %clang_cc1 -std=c2y -triple x86_64-unknown-linux-gnu -emit-llvm %s -o - | FileCheck %s
//
@@ -12,80 +13,371 @@ typedef _BitInt(64) S64;
typedef _BitInt(128) S128;
typedef _BitInt(256) S256;
-// CHECK-LABEL: @ld37(
-// CHECK: load atomic i64
+// CHECK-LABEL: define dso_local i64 @ld37(
+// CHECK-SAME: ptr noundef [[P:%.*]]) #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: [[RETVAL:%.*]] = alloca i37, align 8
+// CHECK-NEXT: [[P_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT: [[ATOMIC_TEMP:%.*]] = alloca i64, align 8
+// CHECK-NEXT: store ptr [[P]], ptr [[P_ADDR]], align 8
+// CHECK-NEXT: [[TMP0:%.*]] = load ptr, ptr [[P_ADDR]], align 8
+// CHECK-NEXT: [[TMP1:%.*]] = load atomic i64, ptr [[TMP0]] seq_cst, align 8
+// CHECK-NEXT: store i64 [[TMP1]], ptr [[ATOMIC_TEMP]], align 8
+// CHECK-NEXT: [[TMP2:%.*]] = load i64, ptr [[ATOMIC_TEMP]], align 8
+// CHECK-NEXT: [[LOADEDV:%.*]] = trunc i64 [[TMP2]] to i37
+// CHECK-NEXT: store i37 [[LOADEDV]], ptr [[RETVAL]], align 8
+// CHECK-NEXT: [[TMP3:%.*]] = load i37, ptr [[RETVAL]], align 8
+// CHECK-NEXT: [[COERCE_VAL_II:%.*]] = zext i37 [[TMP3]] to i64
+// CHECK-NEXT: ret i64 [[COERCE_VAL_II]]
+//
S37 ld37(_Atomic(S37) *p) { return __c11_atomic_load(p, __ATOMIC_SEQ_CST); }
-// CHECK-LABEL: @st37(
-// CHECK: store atomic i64
+// CHECK-LABEL: define dso_local void @st37(
+// CHECK-SAME: ptr noundef [[P:%.*]], i64 noundef [[V_COERCE:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: [[V:%.*]] = alloca i64, align 8
+// CHECK-NEXT: [[P_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT: [[V_ADDR:%.*]] = alloca i64, align 8
+// CHECK-NEXT: [[DOTATOMICTMP:%.*]] = alloca i64, align 8
+// CHECK-NEXT: store i64 [[V_COERCE]], ptr [[V]], align 8
+// CHECK-NEXT: [[TMP0:%.*]] = load i64, ptr [[V]], align 8
+// CHECK-NEXT: [[V1:%.*]] = trunc i64 [[TMP0]] to i37
+// CHECK-NEXT: store ptr [[P]], ptr [[P_ADDR]], align 8
+// CHECK-NEXT: [[STOREDV:%.*]] = sext i37 [[V1]] to i64
+// CHECK-NEXT: store i64 [[STOREDV]], ptr [[V_ADDR]], align 8
+// CHECK-NEXT: [[TMP1:%.*]] = load ptr, ptr [[P_ADDR]], align 8
+// CHECK-NEXT: [[TMP2:%.*]] = load i64, ptr [[V_ADDR]], align 8
+// CHECK-NEXT: [[LOADEDV:%.*]] = trunc i64 [[TMP2]] to i37
+// CHECK-NEXT: [[STOREDV2:%.*]] = sext i37 [[LOADEDV]] to i64
+// CHECK-NEXT: store i64 [[STOREDV2]], ptr [[DOTATOMICTMP]], align 8
+// CHECK-NEXT: [[TMP3:%.*]] = load i64, ptr [[DOTATOMICTMP]], align 8
+// CHECK-NEXT: store atomic i64 [[TMP3]], ptr [[TMP1]] seq_cst, align 8
+// CHECK-NEXT: ret void
+//
void st37(_Atomic(S37) *p, S37 v) { __c11_atomic_store(p, v, __ATOMIC_SEQ_CST); }
-// CHECK-LABEL: @xchg37(
-// CHECK: atomicrmw xchg ptr {{.*}} i64
+// CHECK-LABEL: define dso_local i64 @xchg37(
+// CHECK-SAME: ptr noundef [[P:%.*]], i64 noundef [[V_COERCE:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: [[RETVAL:%.*]] = alloca i37, align 8
+// CHECK-NEXT: [[V:%.*]] = alloca i64, align 8
+// CHECK-NEXT: [[P_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT: [[V_ADDR:%.*]] = alloca i64, align 8
+// CHECK-NEXT: [[DOTATOMICTMP:%.*]] = alloca i64, align 8
+// CHECK-NEXT: [[ATOMIC_TEMP:%.*]] = alloca i64, align 8
+// CHECK-NEXT: store i64 [[V_COERCE]], ptr [[V]], align 8
+// CHECK-NEXT: [[TMP0:%.*]] = load i64, ptr [[V]], align 8
+// CHECK-NEXT: [[V1:%.*]] = trunc i64 [[TMP0]] to i37
+// CHECK-NEXT: store ptr [[P]], ptr [[P_ADDR]], align 8
+// CHECK-NEXT: [[STOREDV:%.*]] = sext i37 [[V1]] to i64
+// CHECK-NEXT: store i64 [[STOREDV]], ptr [[V_ADDR]], align 8
+// CHECK-NEXT: [[TMP1:%.*]] = load ptr, ptr [[P_ADDR]], align 8
+// CHECK-NEXT: [[TMP2:%.*]] = load i64, ptr [[V_ADDR]], align 8
+// CHECK-NEXT: [[LOADEDV:%.*]] = trunc i64 [[TMP2]] to i37
+// CHECK-NEXT: [[STOREDV2:%.*]] = sext i37 [[LOADEDV]] to i64
+// CHECK-NEXT: store i64 [[STOREDV2]], ptr [[DOTATOMICTMP]], align 8
+// CHECK-NEXT: [[TMP3:%.*]] = load i64, ptr [[DOTATOMICTMP]], align 8
+// CHECK-NEXT: [[TMP4:%.*]] = atomicrmw xchg ptr [[TMP1]], i64 [[TMP3]] seq_cst, align 8
+// CHECK-NEXT: store i64 [[TMP4]], ptr [[ATOMIC_TEMP]], align 8
+// CHECK-NEXT: [[TMP5:%.*]] = load i64, ptr [[ATOMIC_TEMP]], align 8
+// CHECK-NEXT: [[LOADEDV3:%.*]] = trunc i64 [[TMP5]] to i37
+// CHECK-NEXT: store i37 [[LOADEDV3]], ptr [[RETVAL]], align 8
+// CHECK-NEXT: [[TMP6:%.*]] = load i37, ptr [[RETVAL]], align 8
+// CHECK-NEXT: [[COERCE_VAL_II:%.*]] = zext i37 [[TMP6]] to i64
+// CHECK-NEXT: ret i64 [[COERCE_VAL_II]]
+//
S37 xchg37(_Atomic(S37) *p, S37 v) {
return __c11_atomic_exchange(p, v, __ATOMIC_SEQ_CST);
}
-// CHECK-LABEL: @cas37(
-// CHECK: cmpxchg ptr {{.*}} i64
+// CHECK-LABEL: define dso_local zeroext i1 @cas37(
+// CHECK-SAME: ptr noundef [[P:%.*]], ptr noundef [[E:%.*]], i64 noundef [[D_COERCE:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: [[D:%.*]] = alloca i64, align 8
+// CHECK-NEXT: [[P_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT: [[E_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT: [[D_ADDR:%.*]] = alloca i64, align 8
+// CHECK-NEXT: [[DOTATOMICTMP:%.*]] = alloca i64, align 8
+// CHECK-NEXT: [[CMPXCHG_BOOL:%.*]] = alloca i8, align 1
+// CHECK-NEXT: store i64 [[D_COERCE]], ptr [[D]], align 8
+// CHECK-NEXT: [[TMP0:%.*]] = load i64, ptr [[D]], align 8
+// CHECK-NEXT: [[D1:%.*]] = trunc i64 [[TMP0]] to i37
+// CHECK-NEXT: store ptr [[P]], ptr [[P_ADDR]], align 8
+// CHECK-NEXT: store ptr [[E]], ptr [[E_ADDR]], align 8
+// CHECK-NEXT: [[STOREDV:%.*]] = sext i37 [[D1]] to i64
+// CHECK-NEXT: store i64 [[STOREDV]], ptr [[D_ADDR]], align 8
+// CHECK-NEXT: [[TMP1:%.*]] = load ptr, ptr [[P_ADDR]], align 8
+// CHECK-NEXT: [[TMP2:%.*]] = load ptr, ptr [[E_ADDR]], align 8
+// CHECK-NEXT: [[TMP3:%.*]] = load i64, ptr [[D_ADDR]], align 8
+// CHECK-NEXT: [[LOADEDV:%.*]] = trunc i64 [[TMP3]] to i37
+// CHECK-NEXT: [[STOREDV2:%.*]] = sext i37 [[LOADEDV]] to i64
+// CHECK-NEXT: store i64 [[STOREDV2]], ptr [[DOTATOMICTMP]], align 8
+// CHECK-NEXT: [[TMP4:%.*]] = load i64, ptr [[TMP2]], align 8
+// CHECK-NEXT: [[TMP5:%.*]] = load i64, ptr [[DOTATOMICTMP]], align 8
+// CHECK-NEXT: [[TMP6:%.*]] = cmpxchg ptr [[TMP1]], i64 [[TMP4]], i64 [[TMP5]] seq_cst seq_cst, align 8
+// CHECK-NEXT: [[TMP7:%.*]] = extractvalue { i64, i1 } [[TMP6]], 0
+// CHECK-NEXT: [[TMP8:%.*]] = extractvalue { i64, i1 } [[TMP6]], 1
+// CHECK-NEXT: br i1 [[TMP8]], label %[[CMPXCHG_CONTINUE:.*]], label %[[CMPXCHG_STORE_EXPECTED:.*]]
+// CHECK: [[CMPXCHG_STORE_EXPECTED]]:
+// CHECK-NEXT: store i64 [[TMP7]], ptr [[TMP2]], align 8
+// CHECK-NEXT: br label %[[CMPXCHG_CONTINUE]]
+// CHECK: [[CMPXCHG_CONTINUE]]:
+// CHECK-NEXT: [[STOREDV3:%.*]] = zext i1 [[TMP8]] to i8
+// CHECK-NEXT: store i8 [[STOREDV3]], ptr [[CMPXCHG_BOOL]], align 1
+// CHECK-NEXT: [[TMP9:%.*]] = load i8, ptr [[CMPXCHG_BOOL]], align 1
+// CHECK-NEXT: [[LOADEDV4:%.*]] = icmp ne i8 [[TMP9]], 0
+// CHECK-NEXT: ret i1 [[LOADEDV4]]
+//
_Bool cas37(_Atomic(S37) *p, S37 *e, S37 d) {
return __c11_atomic_compare_exchange_strong(p, e, d, __ATOMIC_SEQ_CST,
__ATOMIC_SEQ_CST);
}
// Bitwise RMW on a padded width keeps the direct atomicrmw: it is exact.
-// CHECK-LABEL: @and37(
-// CHECK: atomicrmw and ptr {{.*}} i64
-// CHECK-NOT: cmpxchg
+// CHECK-LABEL: define dso_local i64 @and37(
+// CHECK-SAME: ptr noundef [[P:%.*]], i64 noundef [[V_COERCE:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: [[RETVAL:%.*]] = alloca i37, align 8
+// CHECK-NEXT: [[V:%.*]] = alloca i64, align 8
+// CHECK-NEXT: [[P_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT: [[V_ADDR:%.*]] = alloca i64, align 8
+// CHECK-NEXT: [[DOTATOMICTMP:%.*]] = alloca i64, align 8
+// CHECK-NEXT: [[ATOMIC_TEMP:%.*]] = alloca i64, align 8
+// CHECK-NEXT: store i64 [[V_COERCE]], ptr [[V]], align 8
+// CHECK-NEXT: [[TMP0:%.*]] = load i64, ptr [[V]], align 8
+// CHECK-NEXT: [[V1:%.*]] = trunc i64 [[TMP0]] to i37
+// CHECK-NEXT: store ptr [[P]], ptr [[P_ADDR]], align 8
+// CHECK-NEXT: [[STOREDV:%.*]] = sext i37 [[V1]] to i64
+// CHECK-NEXT: store i64 [[STOREDV]], ptr [[V_ADDR]], align 8
+// CHECK-NEXT: [[TMP1:%.*]] = load ptr, ptr [[P_ADDR]], align 8
+// CHECK-NEXT: [[TMP2:%.*]] = load i64, ptr [[V_ADDR]], align 8
+// CHECK-NEXT: [[LOADEDV:%.*]] = trunc i64 [[TMP2]] to i37
+// CHECK-NEXT: [[STOREDV2:%.*]] = sext i37 [[LOADEDV]] to i64
+// CHECK-NEXT: store i64 [[STOREDV2]], ptr [[DOTATOMICTMP]], align 8
+// CHECK-NEXT: [[TMP3:%.*]] = load i64, ptr [[DOTATOMICTMP]], align 8
+// CHECK-NEXT: [[TMP4:%.*]] = atomicrmw and ptr [[TMP1]], i64 [[TMP3]] seq_cst, align 8
+// CHECK-NEXT: store i64 [[TMP4]], ptr [[ATOMIC_TEMP]], align 8
+// CHECK-NEXT: [[TMP5:%.*]] = load i64, ptr [[ATOMIC_TEMP]], align 8
+// CHECK-NEXT: [[LOADEDV3:%.*]] = trunc i64 [[TMP5]] to i37
+// CHECK-NEXT: store i37 [[LOADEDV3]], ptr [[RETVAL]], align 8
+// CHECK-NEXT: [[TMP6:%.*]] = load i37, ptr [[RETVAL]], align 8
+// CHECK-NEXT: [[COERCE_VAL_II:%.*]] = zext i37 [[TMP6]] to i64
+// CHECK-NEXT: ret i64 [[COERCE_VAL_II]]
+//
S37 and37(_Atomic(S37) *p, S37 v) {
return __c11_atomic_fetch_and(p, v, __ATOMIC_SEQ_CST);
}
// Arithmetic RMW on a padded width becomes a compare-exchange loop, not a bare
// atomicrmw that would carry into the padding bits.
-// CHECK-LABEL: @add37(
-// CHECK: atomicrmw.start:
-// CHECK: cmpxchg weak ptr {{.*}} i64
-// CHECK-NOT: atomicrmw add
+// CHECK-LABEL: define dso_local i64 @add37(
+// CHECK-SAME: ptr noundef [[P:%.*]], i64 noundef [[V_COERCE:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT: [[ENTRY:.*]]:
+// CHECK-NEXT: [[RETVAL:%.*]] = alloca i37, align 8
+// CHECK-NEXT: [[V:%.*]] = alloca i64, align 8
+// CHECK-NEXT: [[P_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT: [[V_ADDR:%.*]] = alloca i64, align 8
+// CHECK-NEXT: [[DOTATOMICTMP:%.*]] = alloca i64, align 8
+// CHECK-NEXT: store i64 [[V_COERCE]], ptr [[V]], align 8
+// CHECK-NEXT: [[TMP0:%.*]] = load i64, ptr [[V]], align 8
+// CHECK-NEXT: [[V1:%.*]] = trunc i64 [[TMP0]] to i37
+// CHECK-NEXT: store ptr [[P]], ptr [[P_ADDR]], align 8
+// CHECK-NEXT: [[STOREDV:%.*]] = sext i37 [[V1]] to i64
+// CHECK-NEXT: store i64 [[STOREDV]], ptr [[V_ADDR]], align 8
+// CHECK-NEXT: [[TMP1:%.*]] = load ptr, ptr [[P_ADDR]], align 8
+// CHECK-NEXT: [[TMP2:%.*]] = load i64, ptr [[V_ADDR]], align 8
+// CHECK-NEXT: [[LOADEDV:%.*]] = trunc i64 [[TMP2]] to i37
+// CHECK-NEXT: [[STOREDV2:%.*]] = sext i37 [[LOADEDV]] to i64
+// CHECK-NEXT: store i64 [[STOREDV2]], ptr [[DOTATOMICTMP]], align 8
+// CHECK-NEXT: [[TMP3:%.*]] = load i64, ptr [[DOTATOMICTMP]], align 8
+// CHECK-NEXT: [[LOADEDV3:%.*]] = trunc i64 [[TMP3]] to i37
+// CHECK-NEXT: [[ATOMIC_LOAD:%.*]] = load atomic i64, ptr [[TMP1]] monotonic, align 8
+// CHECK-NEXT: [[LOADEDV4:%.*]] = trunc i64 [[ATOMIC_LOAD]] to i37
+// CHECK-NEXT: br label %[[ATOMICRMW_START:.*]]
+// CHECK: [[ATOMICRMW_START]]:
+// CHECK-NEXT: [[TMP4:%.*]] = phi i37 [ [[LOADEDV4]], %[[ENTRY]] ], [ [[LOADEDV7:%.*]], %[[ATOMICRMW_START]] ]
+// CHECK-NEXT: [[NEW:%.*]] = add i37 [[TMP4]], [[LOADEDV3]]
+// CHECK-NEXT: [[STOREDV5:%.*]] = sext i37 [[TMP4]] to i64
+// CHECK-NEXT: [[STOREDV6:%.*]] = sext i37 [[NEW]] to i64
+// CHECK-NEXT: [[TMP5:%.*]] = cmpxchg weak ptr [[TMP1]], i64 [[STOREDV5]], i64 [[STOREDV6]] seq_cst seq_cst, align 8
+// CHECK-NEXT: [[TMP6:%.*]] = extractvalue { i64, i1 } [[TMP5]], 0
+// CHECK-NEXT: [[TMP7:%.*]] = extractvalue { i64, i1 } [[TMP5]], 1
+// CHECK-NEXT: [[LOADEDV7]] = trunc i64 [[TMP6]] to i37
+// CHECK-NEXT: br i1 [[TMP7]], label %[[ATOMICRMW_END:.*]], label %[[ATOMICRMW_START]]
+// CHECK: [[ATOMICRMW_END]]:
+// CHECK-NEXT: store i37 [[TMP4]], ptr [[RETVAL]], align 8
+// CHECK-NEXT: [[TMP8:%.*]] = load i37, ptr [[RETVAL]], align 8
+// CHECK-NEXT: [[COERCE_VAL_II:%.*]] = zext i37 [[TMP8]] to i64
+// CHECK-NEXT: ret i64 [[COERCE_VAL_II]]
+//
S37 add37(_Atomic(S37) *p, S37 v) {
return __c11_atomic_fetch_add(p, v, __ATOMIC_SEQ_CST);
}
// Signed min is computed at the value width, so the sign bit is at bit N-1.
-// CHECK-LABEL: @min37(
-// CHECK: icmp sle i37
-// CHECK: select i1
-// CHECK: cmpxchg weak ptr {{.*}} i64
+// CHECK-LABEL: define dso_local i64 @min37(
+// CHECK-SAME: ptr noundef [[P:%.*]], i64 noundef [[V_COERCE:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT: [[ENTRY:.*]]:
+// CHECK-NEXT: [[RETVAL:%.*]] = alloca i37, align 8
+// CHECK-NEXT: [[V:%.*]] = alloca i64, align 8
+// CHECK-NEXT: [[P_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT: [[V_ADDR:%.*]] = alloca i64, align 8
+// CHECK-NEXT: [[DOTATOMICTMP:%.*]] = alloca i64, align 8
+// CHECK-NEXT: store i64 [[V_COERCE]], ptr [[V]], align 8
+// CHECK-NEXT: [[TMP0:%.*]] = load i64, ptr [[V]], align 8
+// CHECK-NEXT: [[V1:%.*]] = trunc i64 [[TMP0]] to i37
+// CHECK-NEXT: store ptr [[P]], ptr [[P_ADDR]], align 8
+// CHECK-NEXT: [[STOREDV:%.*]] = sext i37 [[V1]] to i64
+// CHECK-NEXT: store i64 [[STOREDV]], ptr [[V_ADDR]], align 8
+// CHECK-NEXT: [[TMP1:%.*]] = load ptr, ptr [[P_ADDR]], align 8
+// CHECK-NEXT: [[TMP2:%.*]] = load i64, ptr [[V_ADDR]], align 8
+// CHECK-NEXT: [[LOADEDV:%.*]] = trunc i64 [[TMP2]] to i37
+// CHECK-NEXT: [[STOREDV2:%.*]] = sext i37 [[LOADEDV]] to i64
+// CHECK-NEXT: store i64 [[STOREDV2]], ptr [[DOTATOMICTMP]], align 8
+// CHECK-NEXT: [[TMP3:%.*]] = load i64, ptr [[DOTATOMICTMP]], align 8
+// CHECK-NEXT: [[LOADEDV3:%.*]] = trunc i64 [[TMP3]] to i37
+// CHECK-NEXT: [[ATOMIC_LOAD:%.*]] = load atomic i64, ptr [[TMP1]] monotonic, align 8
+// CHECK-NEXT: [[LOADEDV4:%.*]] = trunc i64 [[ATOMIC_LOAD]] to i37
+// CHECK-NEXT: br label %[[ATOMICRMW_START:.*]]
+// CHECK: [[ATOMICRMW_START]]:
+// CHECK-NEXT: [[TMP4:%.*]] = phi i37 [ [[LOADEDV4]], %[[ENTRY]] ], [ [[LOADEDV7:%.*]], %[[ATOMICRMW_START]] ]
+// CHECK-NEXT: [[TMP5:%.*]] = icmp sle i37 [[TMP4]], [[LOADEDV3]]
+// CHECK-NEXT: [[NEW:%.*]] = select i1 [[TMP5]], i37 [[TMP4]], i37 [[LOADEDV3]]
+// CHECK-NEXT: [[STOREDV5:%.*]] = sext i37 [[TMP4]] to i64
+// CHECK-NEXT: [[STOREDV6:%.*]] = sext i37 [[NEW]] to i64
+// CHECK-NEXT: [[TMP6:%.*]] = cmpxchg weak ptr [[TMP1]], i64 [[STOREDV5]], i64 [[STOREDV6]] seq_cst seq_cst, align 8
+// CHECK-NEXT: [[TMP7:%.*]] = extractvalue { i64, i1 } [[TMP6]], 0
+// CHECK-NEXT: [[TMP8:%.*]] = extractvalue { i64, i1 } [[TMP6]], 1
+// CHECK-NEXT: [[LOADEDV7]] = trunc i64 [[TMP7]] to i37
+// CHECK-NEXT: br i1 [[TMP8]], label %[[ATOMICRMW_END:.*]], label %[[ATOMICRMW_START]]
+// CHECK: [[ATOMICRMW_END]]:
+// CHECK-NEXT: store i37 [[TMP4]], ptr [[RETVAL]], align 8
+// CHECK-NEXT: [[TMP9:%.*]] = load i37, ptr [[RETVAL]], align 8
+// CHECK-NEXT: [[COERCE_VAL_II:%.*]] = zext i37 [[TMP9]] to i64
+// CHECK-NEXT: ret i64 [[COERCE_VAL_II]]
+//
U37 min37(_Atomic(S37) *p, S37 v) {
return __c11_atomic_fetch_min(p, v, __ATOMIC_SEQ_CST);
}
// No padding: direct atomicrmw, no loop.
-// CHECK-LABEL: @add64(
-// CHECK: atomicrmw add ptr {{.*}} i64
-// CHECK-NOT: cmpxchg
+// CHECK-LABEL: define dso_local i64 @add64(
+// CHECK-SAME: ptr noundef [[P:%.*]], i64 noundef [[V:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: [[P_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT: [[V_ADDR:%.*]] = alloca i64, align 8
+// CHECK-NEXT: [[DOTATOMICTMP:%.*]] = alloca i64, align 8
+// CHECK-NEXT: [[ATOMIC_TEMP:%.*]] = alloca i64, align 8
+// CHECK-NEXT: store ptr [[P]], ptr [[P_ADDR]], align 8
+// CHECK-NEXT: store i64 [[V]], ptr [[V_ADDR]], align 8
+// CHECK-NEXT: [[TMP0:%.*]] = load ptr, ptr [[P_ADDR]], align 8
+// CHECK-NEXT: [[TMP1:%.*]] = load i64, ptr [[V_ADDR]], align 8
+// CHECK-NEXT: store i64 [[TMP1]], ptr [[DOTATOMICTMP]], align 8
+// CHECK-NEXT: [[TMP2:%.*]] = load i64, ptr [[DOTATOMICTMP]], align 8
+// CHECK-NEXT: [[TMP3:%.*]] = atomicrmw add ptr [[TMP0]], i64 [[TMP2]] seq_cst, align 8
+// CHECK-NEXT: store i64 [[TMP3]], ptr [[ATOMIC_TEMP]], align 8
+// CHECK-NEXT: [[TMP4:%.*]] = load i64, ptr [[ATOMIC_TEMP]], align 8
+// CHECK-NEXT: ret i64 [[TMP4]]
+//
S64 add64(_Atomic(S64) *p, S64 v) {
return __c11_atomic_fetch_add(p, v, __ATOMIC_SEQ_CST);
}
-// CHECK-LABEL: @add128(
-// CHECK: atomicrmw add ptr {{.*}} i128
+// CHECK-LABEL: define dso_local i128 @add128(
+// CHECK-SAME: ptr noundef [[P:%.*]], i128 noundef [[V:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: [[P_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT: [[V_ADDR:%.*]] = alloca i128, align 8
+// CHECK-NEXT: [[DOTATOMICTMP:%.*]] = alloca i128, align 8
+// CHECK-NEXT: [[ATOMIC_TEMP:%.*]] = alloca i128, align 16
+// CHECK-NEXT: store ptr [[P]], ptr [[P_ADDR]], align 8
+// CHECK-NEXT: store i128 [[V]], ptr [[V_ADDR]], align 8
+// CHECK-NEXT: [[TMP0:%.*]] = load ptr, ptr [[P_ADDR]], align 8
+// CHECK-NEXT: [[TMP1:%.*]] = load i128, ptr [[V_ADDR]], align 8
+// CHECK-NEXT: store i128 [[TMP1]], ptr [[DOTATOMICTMP]], align 8
+// CHECK-NEXT: [[TMP2:%.*]] = load i128, ptr [[DOTATOMICTMP]], align 8
+// CHECK-NEXT: [[TMP3:%.*]] = atomicrmw add ptr [[TMP0]], i128 [[TMP2]] seq_cst, align 16
+// CHECK-NEXT: store i128 [[TMP3]], ptr [[ATOMIC_TEMP]], align 16
+// CHECK-NEXT: [[TMP4:%.*]] = load i128, ptr [[ATOMIC_TEMP]], align 16
+// CHECK-NEXT: ret i128 [[TMP4]]
+//
S128 add128(_Atomic(S128) *p, S128 v) {
return __c11_atomic_fetch_add(p, v, __ATOMIC_SEQ_CST);
}
// Wide: no inline atomicrmw and no arbitrary-width __atomic_fetch_add libcall,
// so the loop calls __atomic_compare_exchange.
-// CHECK-LABEL: @add256(
-// CHECK: call {{.*}}@__atomic_compare_exchange
-// CHECK-NOT: cmpxchg
+// CHECK-LABEL: define dso_local void @add256(
+// CHECK-SAME: ptr dead_on_unwind noalias writable sret(i256) align 8 [[AGG_RESULT:%.*]], ptr noundef [[P:%.*]], ptr noundef byval(i256) align 8 [[TMP0:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT: [[ENTRY:.*]]:
+// CHECK-NEXT: [[P_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT: [[V_ADDR:%.*]] = alloca i256, align 8
+// CHECK-NEXT: [[DOTATOMICTMP:%.*]] = alloca i256, align 8
+// CHECK-NEXT: [[ATOMIC_TEMP:%.*]] = alloca i256, align 8
+// CHECK-NEXT: [[ATOMIC_TEMP1:%.*]] = alloca i256, align 8
+// CHECK-NEXT: [[ATOMIC_TEMP2:%.*]] = alloca i256, align 8
+// CHECK-NEXT: [[V:%.*]] = load i256, ptr [[TMP0]], align 8
+// CHECK-NEXT: store ptr [[P]], ptr [[P_ADDR]], align 8
+// CHECK-NEXT: store i256 [[V]], ptr [[V_ADDR]], align 8
+// CHECK-NEXT: [[TMP1:%.*]] = load ptr, ptr [[P_ADDR]], align 8
+// CHECK-NEXT: [[TMP2:%.*]] = load i256, ptr [[V_ADDR]], align 8
+// CHECK-NEXT: store i256 [[TMP2]], ptr [[DOTATOMICTMP]], align 8
+// CHECK-NEXT: [[TMP3:%.*]] = load i256, ptr [[DOTATOMICTMP]], align 8
+// CHECK-NEXT: call void @__atomic_load(i64 noundef 32, ptr noundef [[TMP1]], ptr noundef [[ATOMIC_TEMP]], i32 noundef 0)
+// CHECK-NEXT: [[TMP4:%.*]] = load i256, ptr [[ATOMIC_TEMP]], align 8
+// CHECK-NEXT: br label %[[ATOMICRMW_START:.*]]
+// CHECK: [[ATOMICRMW_START]]:
+// CHECK-NEXT: [[TMP5:%.*]] = phi i256 [ [[TMP4]], %[[ENTRY]] ], [ [[TMP6:%.*]], %[[ATOMICRMW_START]] ]
+// CHECK-NEXT: [[NEW:%.*]] = add i256 [[TMP5]], [[TMP3]]
+// CHECK-NEXT: store i256 [[TMP5]], ptr [[ATOMIC_TEMP1]], align 8
+// CHECK-NEXT: store i256 [[NEW]], ptr [[ATOMIC_TEMP2]], align 8
+// CHECK-NEXT: [[CALL:%.*]] = call zeroext i1 @__atomic_compare_exchange(i64 noundef 32, ptr noundef [[TMP1]], ptr noundef [[ATOMIC_TEMP1]], ptr noundef [[ATOMIC_TEMP2]], i32 noundef 5, i32 noundef 5)
+// CHECK-NEXT: [[TMP6]] = load i256, ptr [[ATOMIC_TEMP1]], align 8
+// CHECK-NEXT: br i1 [[CALL]], label %[[ATOMICRMW_END:.*]], label %[[ATOMICRMW_START]]
+// CHECK: [[ATOMICRMW_END]]:
+// CHECK-NEXT: store i256 [[TMP5]], ptr [[AGG_RESULT]], align 8
+// CHECK-NEXT: [[TMP7:%.*]] = load i256, ptr [[AGG_RESULT]], align 8
+// CHECK-NEXT: store i256 [[TMP7]], ptr [[AGG_RESULT]], align 8
+// CHECK-NEXT: ret void
+//
S256 add256(_Atomic(S256) *p, S256 v) {
return __c11_atomic_fetch_add(p, v, __ATOMIC_SEQ_CST);
}
// Wide bitwise also needs the loop: the wide path has no inline atomicrmw.
-// CHECK-LABEL: @or256(
-// CHECK: call {{.*}}@__atomic_compare_exchange
+// CHECK-LABEL: define dso_local void @or256(
+// CHECK-SAME: ptr dead_on_unwind noalias writable sret(i256) align 8 [[AGG_RESULT:%.*]], ptr noundef [[P:%.*]], ptr noundef byval(i256) align 8 [[TMP0:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT: [[ENTRY:.*]]:
+// CHECK-NEXT: [[P_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT: [[V_ADDR:%.*]] = alloca i256, align 8
+// CHECK-NEXT: [[DOTATOMICTMP:%.*]] = alloca i256, align 8
+// CHECK-NEXT: [[ATOMIC_TEMP:%.*]] = alloca i256, align 8
+// CHECK-NEXT: [[ATOMIC_TEMP1:%.*]] = alloca i256, align 8
+// CHECK-NEXT: [[ATOMIC_TEMP2:%.*]] = alloca i256, align 8
+// CHECK-NEXT: [[V:%.*]] = load i256, ptr [[TMP0]], align 8
+// CHECK-NEXT: store ptr [[P]], ptr [[P_ADDR]], align 8
+// CHECK-NEXT: store i256 [[V]], ptr [[V_ADDR]], align 8
+// CHECK-NEXT: [[TMP1:%.*]] = load ptr, ptr [[P_ADDR]], align 8
+// CHECK-NEXT: [[TMP2:%.*]] = load i256, ptr [[V_ADDR]], align 8
+// CHECK-NEXT: store i256 [[TMP2]], ptr [[DOTATOMICTMP]], align 8
+// CHECK-NEXT: [[TMP3:%.*]] = load i256, ptr [[DOTATOMICTMP]], align 8
+// CHECK-NEXT: call void @__atomic_load(i64 noundef 32, ptr noundef [[TMP1]], ptr noundef [[ATOMIC_TEMP]], i32 noundef 0)
+// CHECK-NEXT: [[TMP4:%.*]] = load i256, ptr [[ATOMIC_TEMP]], align 8
+// CHECK-NEXT: br label %[[ATOMICRMW_START:.*]]
+// CHECK: [[ATOMICRMW_START]]:
+// CHECK-NEXT: [[TMP5:%.*]] = phi i256 [ [[TMP4]], %[[ENTRY]] ], [ [[TMP6:%.*]], %[[ATOMICRMW_START]] ]
+// CHECK-NEXT: [[NEW:%.*]] = or i256 [[TMP5]], [[TMP3]]
+// CHECK-NEXT: store i256 [[TMP5]], ptr [[ATOMIC_TEMP1]], align 8
+// CHECK-NEXT: store i256 [[NEW]], ptr [[ATOMIC_TEMP2]], align 8
+// CHECK-NEXT: [[CALL:%.*]] = call zeroext i1 @__atomic_compare_exchange(i64 noundef 32, ptr noundef [[TMP1]], ptr noundef [[ATOMIC_TEMP1]], ptr noundef [[ATOMIC_TEMP2]], i32 noundef 5, i32 noundef 5)
+// CHECK-NEXT: [[TMP6]] = load i256, ptr [[ATOMIC_TEMP1]], align 8
+// CHECK-NEXT: br i1 [[CALL]], label %[[ATOMICRMW_END:.*]], label %[[ATOMICRMW_START]]
+// CHECK: [[ATOMICRMW_END]]:
+// CHECK-NEXT: store i256 [[TMP5]], ptr [[AGG_RESULT]], align 8
+// CHECK-NEXT: [[TMP7:%.*]] = load i256, ptr [[AGG_RESULT]], align 8
+// CHECK-NEXT: store i256 [[TMP7]], ptr [[AGG_RESULT]], align 8
+// CHECK-NEXT: ret void
+//
S256 or256(_Atomic(S256) *p, S256 v) {
return __c11_atomic_fetch_or(p, v, __ATOMIC_SEQ_CST);
}
>From 1c87c9994199bd82766b4d6d6f5edf392c225ba2 Mon Sep 17 00:00:00 2001
From: Xavier Roche <xavier.roche at algolia.com>
Date: Sat, 27 Jun 2026 09:23:04 +0200
Subject: [PATCH 6/9] [Clang] Carry the raw representation in the _BitInt
atomic RMW loop
The compare-exchange loop for padded and wide _BitInt atomics formed its
cmpxchg expected by re-canonicalizing the loaded value (sign/zero-extending
the truncated old). An object whose padding bits were non-canonical, e.g.
written through a union, then never matched that expected, so the cmpxchg
failed every iteration and the read-modify-write spun forever.
Reuse the existing EmitAtomicUpdate loop, which carries the raw loaded
representation as the expected and writes back a canonical desired computed
at value width N. The object converges on the first iteration regardless of
its padding, and the value it stores is canonical. See P0528.
Assisted-by: Claude (Anthropic)
Co-Authored-By: Claude Opus 4.6 <noreply at anthropic.com>
---
clang/lib/CodeGen/CGAtomic.cpp | 48 +++++-------
clang/test/CodeGen/atomic-bitint.c | 120 ++++++++++++++---------------
2 files changed, 77 insertions(+), 91 deletions(-)
diff --git a/clang/lib/CodeGen/CGAtomic.cpp b/clang/lib/CodeGen/CGAtomic.cpp
index 820849f5974c0..0043c79b398ee 100644
--- a/clang/lib/CodeGen/CGAtomic.cpp
+++ b/clang/lib/CodeGen/CGAtomic.cpp
@@ -702,9 +702,13 @@ static llvm::AtomicOrdering atomicOrderOrSeqCst(llvm::Value *Order) {
/// Emit a `_BitInt(N)` atomic read-modify-write as a compare-exchange loop. A
/// single `atomicrmw` on the padded memory integer would carry into / compare
/// the padding bits, and no arbitrary-width `__atomic_fetch_*` libcall exists
-/// for wide widths. The loop computes the new value at width N and writes back
-/// a canonical (extended) representation via the existing cmpxchg helper, which
-/// also picks the inline-vs-libcall form by size.
+/// for wide widths.
+///
+/// The update computes at value width N (so the result wraps mod 2^N and is
+/// independent of padding). EmitAtomicUpdate carries the raw loaded
+/// representation as the cmpxchg expected, so an object with non-canonical
+/// padding (e.g. written through a union) still converges instead of spinning
+/// forever; the desired it writes back is canonical. See P0528.
static RValue emitBitIntAtomicRMWLoop(CodeGenFunction &CGF, AtomicExpr *E,
Address Ptr, Address Val1,
QualType AtomicTy,
@@ -712,8 +716,6 @@ static RValue emitBitIntAtomicRMWLoop(CodeGenFunction &CGF, AtomicExpr *E,
bool ReturnsNew, llvm::Value *Order) {
QualType ValTy = E->getValueType();
llvm::AtomicOrdering AO = atomicOrderOrSeqCst(Order);
- llvm::AtomicOrdering Failure =
- llvm::AtomicCmpXchgInst::getStrongestFailureOrdering(AO);
LValue AtomicLVal = CGF.MakeAddrLValue(Ptr, AtomicTy);
AtomicInfo Atomics(CGF, AtomicLVal);
@@ -721,31 +723,17 @@ static RValue emitBitIntAtomicRMWLoop(CodeGenFunction &CGF, AtomicExpr *E,
llvm::Value *RHS =
CGF.EmitLoadOfScalar(CGF.MakeAddrLValue(Val1, ValTy), E->getExprLoc());
- RValue OldRV = Atomics.EmitAtomicLoad(
- AggValueSlot::ignored(), E->getExprLoc(),
- /*AsValue=*/true, llvm::AtomicOrdering::Monotonic, E->isVolatile());
- llvm::Value *Init = OldRV.getScalarVal();
-
- llvm::BasicBlock *StartBB = CGF.Builder.GetInsertBlock();
- llvm::BasicBlock *LoopBB = CGF.createBasicBlock("atomicrmw.start", CGF.CurFn);
- llvm::BasicBlock *EndBB = CGF.createBasicBlock("atomicrmw.end", CGF.CurFn);
- CGF.Builder.CreateBr(LoopBB);
- CGF.Builder.SetInsertPoint(LoopBB);
-
- llvm::PHINode *Old = CGF.Builder.CreatePHI(Init->getType(), 2);
- Old->addIncoming(Init, StartBB);
-
- // Compute at the value width via the canonical RMW lowering, so the result
- // wraps mod 2^N and never touches the padding bits.
- llvm::Value *New = llvm::buildAtomicRMWValue(BinOp, CGF.Builder, Old, RHS);
-
- auto Res = Atomics.EmitAtomicCompareExchange(
- RValue::get(Old), RValue::get(New), AO, Failure, /*IsWeak=*/true);
- Old->addIncoming(Res.first.getScalarVal(), CGF.Builder.GetInsertBlock());
- CGF.Builder.CreateCondBr(Res.second, EndBB, LoopBB);
-
- CGF.Builder.SetInsertPoint(EndBB);
- return RValue::get(ReturnsNew ? New : static_cast<llvm::Value *>(Old));
+ llvm::Value *Old = nullptr, *New = nullptr;
+ Atomics.EmitAtomicUpdate(
+ AO,
+ [&](RValue OldRV) {
+ Old = OldRV.getScalarVal();
+ New = llvm::buildAtomicRMWValue(BinOp, CGF.Builder, Old, RHS);
+ return RValue::get(New);
+ },
+ E->isVolatile());
+
+ return RValue::get(ReturnsNew ? New : Old);
}
static void EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *E, Address Dest,
diff --git a/clang/test/CodeGen/atomic-bitint.c b/clang/test/CodeGen/atomic-bitint.c
index 6476c26a0f0dd..bc1e165fd90e3 100644
--- a/clang/test/CodeGen/atomic-bitint.c
+++ b/clang/test/CodeGen/atomic-bitint.c
@@ -178,6 +178,7 @@ S37 and37(_Atomic(S37) *p, S37 v) {
// CHECK-NEXT: [[P_ADDR:%.*]] = alloca ptr, align 8
// CHECK-NEXT: [[V_ADDR:%.*]] = alloca i64, align 8
// CHECK-NEXT: [[DOTATOMICTMP:%.*]] = alloca i64, align 8
+// CHECK-NEXT: [[ATOMIC_TEMP:%.*]] = alloca i64, align 8
// CHECK-NEXT: store i64 [[V_COERCE]], ptr [[V]], align 8
// CHECK-NEXT: [[TMP0:%.*]] = load i64, ptr [[V]], align 8
// CHECK-NEXT: [[V1:%.*]] = trunc i64 [[TMP0]] to i37
@@ -191,23 +192,23 @@ S37 and37(_Atomic(S37) *p, S37 v) {
// CHECK-NEXT: store i64 [[STOREDV2]], ptr [[DOTATOMICTMP]], align 8
// CHECK-NEXT: [[TMP3:%.*]] = load i64, ptr [[DOTATOMICTMP]], align 8
// CHECK-NEXT: [[LOADEDV3:%.*]] = trunc i64 [[TMP3]] to i37
-// CHECK-NEXT: [[ATOMIC_LOAD:%.*]] = load atomic i64, ptr [[TMP1]] monotonic, align 8
-// CHECK-NEXT: [[LOADEDV4:%.*]] = trunc i64 [[ATOMIC_LOAD]] to i37
-// CHECK-NEXT: br label %[[ATOMICRMW_START:.*]]
-// CHECK: [[ATOMICRMW_START]]:
-// CHECK-NEXT: [[TMP4:%.*]] = phi i37 [ [[LOADEDV4]], %[[ENTRY]] ], [ [[LOADEDV7:%.*]], %[[ATOMICRMW_START]] ]
-// CHECK-NEXT: [[NEW:%.*]] = add i37 [[TMP4]], [[LOADEDV3]]
-// CHECK-NEXT: [[STOREDV5:%.*]] = sext i37 [[TMP4]] to i64
-// CHECK-NEXT: [[STOREDV6:%.*]] = sext i37 [[NEW]] to i64
-// CHECK-NEXT: [[TMP5:%.*]] = cmpxchg weak ptr [[TMP1]], i64 [[STOREDV5]], i64 [[STOREDV6]] seq_cst seq_cst, align 8
-// CHECK-NEXT: [[TMP6:%.*]] = extractvalue { i64, i1 } [[TMP5]], 0
-// CHECK-NEXT: [[TMP7:%.*]] = extractvalue { i64, i1 } [[TMP5]], 1
-// CHECK-NEXT: [[LOADEDV7]] = trunc i64 [[TMP6]] to i37
-// CHECK-NEXT: br i1 [[TMP7]], label %[[ATOMICRMW_END:.*]], label %[[ATOMICRMW_START]]
-// CHECK: [[ATOMICRMW_END]]:
-// CHECK-NEXT: store i37 [[TMP4]], ptr [[RETVAL]], align 8
-// CHECK-NEXT: [[TMP8:%.*]] = load i37, ptr [[RETVAL]], align 8
-// CHECK-NEXT: [[COERCE_VAL_II:%.*]] = zext i37 [[TMP8]] to i64
+// CHECK-NEXT: [[ATOMIC_LOAD:%.*]] = load atomic i64, ptr [[TMP1]] seq_cst, align 8
+// CHECK-NEXT: br label %[[ATOMIC_CONT:.*]]
+// CHECK: [[ATOMIC_CONT]]:
+// CHECK-NEXT: [[TMP4:%.*]] = phi i64 [ [[ATOMIC_LOAD]], %[[ENTRY]] ], [ [[TMP7:%.*]], %[[ATOMIC_CONT]] ]
+// CHECK-NEXT: [[LOADEDV4:%.*]] = trunc i64 [[TMP4]] to i37
+// CHECK-NEXT: [[NEW:%.*]] = add i37 [[LOADEDV4]], [[LOADEDV3]]
+// CHECK-NEXT: [[STOREDV5:%.*]] = sext i37 [[NEW]] to i64
+// CHECK-NEXT: store atomic i64 [[STOREDV5]], ptr [[ATOMIC_TEMP]] seq_cst, align 8
+// CHECK-NEXT: [[TMP5:%.*]] = load i64, ptr [[ATOMIC_TEMP]], align 8
+// CHECK-NEXT: [[TMP6:%.*]] = cmpxchg ptr [[TMP1]], i64 [[TMP4]], i64 [[TMP5]] seq_cst seq_cst, align 8
+// CHECK-NEXT: [[TMP7]] = extractvalue { i64, i1 } [[TMP6]], 0
+// CHECK-NEXT: [[TMP8:%.*]] = extractvalue { i64, i1 } [[TMP6]], 1
+// CHECK-NEXT: br i1 [[TMP8]], label %[[ATOMIC_EXIT:.*]], label %[[ATOMIC_CONT]]
+// CHECK: [[ATOMIC_EXIT]]:
+// CHECK-NEXT: store i37 [[LOADEDV4]], ptr [[RETVAL]], align 8
+// CHECK-NEXT: [[TMP9:%.*]] = load i37, ptr [[RETVAL]], align 8
+// CHECK-NEXT: [[COERCE_VAL_II:%.*]] = zext i37 [[TMP9]] to i64
// CHECK-NEXT: ret i64 [[COERCE_VAL_II]]
//
S37 add37(_Atomic(S37) *p, S37 v) {
@@ -223,6 +224,7 @@ S37 add37(_Atomic(S37) *p, S37 v) {
// CHECK-NEXT: [[P_ADDR:%.*]] = alloca ptr, align 8
// CHECK-NEXT: [[V_ADDR:%.*]] = alloca i64, align 8
// CHECK-NEXT: [[DOTATOMICTMP:%.*]] = alloca i64, align 8
+// CHECK-NEXT: [[ATOMIC_TEMP:%.*]] = alloca i64, align 8
// CHECK-NEXT: store i64 [[V_COERCE]], ptr [[V]], align 8
// CHECK-NEXT: [[TMP0:%.*]] = load i64, ptr [[V]], align 8
// CHECK-NEXT: [[V1:%.*]] = trunc i64 [[TMP0]] to i37
@@ -236,24 +238,24 @@ S37 add37(_Atomic(S37) *p, S37 v) {
// CHECK-NEXT: store i64 [[STOREDV2]], ptr [[DOTATOMICTMP]], align 8
// CHECK-NEXT: [[TMP3:%.*]] = load i64, ptr [[DOTATOMICTMP]], align 8
// CHECK-NEXT: [[LOADEDV3:%.*]] = trunc i64 [[TMP3]] to i37
-// CHECK-NEXT: [[ATOMIC_LOAD:%.*]] = load atomic i64, ptr [[TMP1]] monotonic, align 8
-// CHECK-NEXT: [[LOADEDV4:%.*]] = trunc i64 [[ATOMIC_LOAD]] to i37
-// CHECK-NEXT: br label %[[ATOMICRMW_START:.*]]
-// CHECK: [[ATOMICRMW_START]]:
-// CHECK-NEXT: [[TMP4:%.*]] = phi i37 [ [[LOADEDV4]], %[[ENTRY]] ], [ [[LOADEDV7:%.*]], %[[ATOMICRMW_START]] ]
-// CHECK-NEXT: [[TMP5:%.*]] = icmp sle i37 [[TMP4]], [[LOADEDV3]]
-// CHECK-NEXT: [[NEW:%.*]] = select i1 [[TMP5]], i37 [[TMP4]], i37 [[LOADEDV3]]
-// CHECK-NEXT: [[STOREDV5:%.*]] = sext i37 [[TMP4]] to i64
-// CHECK-NEXT: [[STOREDV6:%.*]] = sext i37 [[NEW]] to i64
-// CHECK-NEXT: [[TMP6:%.*]] = cmpxchg weak ptr [[TMP1]], i64 [[STOREDV5]], i64 [[STOREDV6]] seq_cst seq_cst, align 8
-// CHECK-NEXT: [[TMP7:%.*]] = extractvalue { i64, i1 } [[TMP6]], 0
-// CHECK-NEXT: [[TMP8:%.*]] = extractvalue { i64, i1 } [[TMP6]], 1
-// CHECK-NEXT: [[LOADEDV7]] = trunc i64 [[TMP7]] to i37
-// CHECK-NEXT: br i1 [[TMP8]], label %[[ATOMICRMW_END:.*]], label %[[ATOMICRMW_START]]
-// CHECK: [[ATOMICRMW_END]]:
-// CHECK-NEXT: store i37 [[TMP4]], ptr [[RETVAL]], align 8
-// CHECK-NEXT: [[TMP9:%.*]] = load i37, ptr [[RETVAL]], align 8
-// CHECK-NEXT: [[COERCE_VAL_II:%.*]] = zext i37 [[TMP9]] to i64
+// CHECK-NEXT: [[ATOMIC_LOAD:%.*]] = load atomic i64, ptr [[TMP1]] seq_cst, align 8
+// CHECK-NEXT: br label %[[ATOMIC_CONT:.*]]
+// CHECK: [[ATOMIC_CONT]]:
+// CHECK-NEXT: [[TMP4:%.*]] = phi i64 [ [[ATOMIC_LOAD]], %[[ENTRY]] ], [ [[TMP8:%.*]], %[[ATOMIC_CONT]] ]
+// CHECK-NEXT: [[LOADEDV4:%.*]] = trunc i64 [[TMP4]] to i37
+// CHECK-NEXT: [[TMP5:%.*]] = icmp sle i37 [[LOADEDV4]], [[LOADEDV3]]
+// CHECK-NEXT: [[NEW:%.*]] = select i1 [[TMP5]], i37 [[LOADEDV4]], i37 [[LOADEDV3]]
+// CHECK-NEXT: [[STOREDV5:%.*]] = sext i37 [[NEW]] to i64
+// CHECK-NEXT: store atomic i64 [[STOREDV5]], ptr [[ATOMIC_TEMP]] seq_cst, align 8
+// CHECK-NEXT: [[TMP6:%.*]] = load i64, ptr [[ATOMIC_TEMP]], align 8
+// CHECK-NEXT: [[TMP7:%.*]] = cmpxchg ptr [[TMP1]], i64 [[TMP4]], i64 [[TMP6]] seq_cst seq_cst, align 8
+// CHECK-NEXT: [[TMP8]] = extractvalue { i64, i1 } [[TMP7]], 0
+// CHECK-NEXT: [[TMP9:%.*]] = extractvalue { i64, i1 } [[TMP7]], 1
+// CHECK-NEXT: br i1 [[TMP9]], label %[[ATOMIC_EXIT:.*]], label %[[ATOMIC_CONT]]
+// CHECK: [[ATOMIC_EXIT]]:
+// CHECK-NEXT: store i37 [[LOADEDV4]], ptr [[RETVAL]], align 8
+// CHECK-NEXT: [[TMP10:%.*]] = load i37, ptr [[RETVAL]], align 8
+// CHECK-NEXT: [[COERCE_VAL_II:%.*]] = zext i37 [[TMP10]] to i64
// CHECK-NEXT: ret i64 [[COERCE_VAL_II]]
//
U37 min37(_Atomic(S37) *p, S37 v) {
@@ -309,7 +311,7 @@ S128 add128(_Atomic(S128) *p, S128 v) {
// so the loop calls __atomic_compare_exchange.
// CHECK-LABEL: define dso_local void @add256(
// CHECK-SAME: ptr dead_on_unwind noalias writable sret(i256) align 8 [[AGG_RESULT:%.*]], ptr noundef [[P:%.*]], ptr noundef byval(i256) align 8 [[TMP0:%.*]]) #[[ATTR0]] {
-// CHECK-NEXT: [[ENTRY:.*]]:
+// CHECK-NEXT: [[ENTRY:.*:]]
// CHECK-NEXT: [[P_ADDR:%.*]] = alloca ptr, align 8
// CHECK-NEXT: [[V_ADDR:%.*]] = alloca i256, align 8
// CHECK-NEXT: [[DOTATOMICTMP:%.*]] = alloca i256, align 8
@@ -323,21 +325,19 @@ S128 add128(_Atomic(S128) *p, S128 v) {
// CHECK-NEXT: [[TMP2:%.*]] = load i256, ptr [[V_ADDR]], align 8
// CHECK-NEXT: store i256 [[TMP2]], ptr [[DOTATOMICTMP]], align 8
// CHECK-NEXT: [[TMP3:%.*]] = load i256, ptr [[DOTATOMICTMP]], align 8
-// CHECK-NEXT: call void @__atomic_load(i64 noundef 32, ptr noundef [[TMP1]], ptr noundef [[ATOMIC_TEMP]], i32 noundef 0)
+// CHECK-NEXT: call void @__atomic_load(i64 noundef 32, ptr noundef [[TMP1]], ptr noundef [[ATOMIC_TEMP]], i32 noundef 5)
+// CHECK-NEXT: br label %[[ATOMIC_CONT:.*]]
+// CHECK: [[ATOMIC_CONT]]:
// CHECK-NEXT: [[TMP4:%.*]] = load i256, ptr [[ATOMIC_TEMP]], align 8
-// CHECK-NEXT: br label %[[ATOMICRMW_START:.*]]
-// CHECK: [[ATOMICRMW_START]]:
-// CHECK-NEXT: [[TMP5:%.*]] = phi i256 [ [[TMP4]], %[[ENTRY]] ], [ [[TMP6:%.*]], %[[ATOMICRMW_START]] ]
-// CHECK-NEXT: [[NEW:%.*]] = add i256 [[TMP5]], [[TMP3]]
-// CHECK-NEXT: store i256 [[TMP5]], ptr [[ATOMIC_TEMP1]], align 8
+// CHECK-NEXT: [[NEW:%.*]] = add i256 [[TMP4]], [[TMP3]]
// CHECK-NEXT: store i256 [[NEW]], ptr [[ATOMIC_TEMP2]], align 8
-// CHECK-NEXT: [[CALL:%.*]] = call zeroext i1 @__atomic_compare_exchange(i64 noundef 32, ptr noundef [[TMP1]], ptr noundef [[ATOMIC_TEMP1]], ptr noundef [[ATOMIC_TEMP2]], i32 noundef 5, i32 noundef 5)
-// CHECK-NEXT: [[TMP6]] = load i256, ptr [[ATOMIC_TEMP1]], align 8
-// CHECK-NEXT: br i1 [[CALL]], label %[[ATOMICRMW_END:.*]], label %[[ATOMICRMW_START]]
-// CHECK: [[ATOMICRMW_END]]:
+// CHECK-NEXT: call void @__atomic_store(i64 noundef 32, ptr noundef [[ATOMIC_TEMP1]], ptr noundef [[ATOMIC_TEMP2]], i32 noundef 5)
+// CHECK-NEXT: [[CALL:%.*]] = call zeroext i1 @__atomic_compare_exchange(i64 noundef 32, ptr noundef [[TMP1]], ptr noundef [[ATOMIC_TEMP]], ptr noundef [[ATOMIC_TEMP1]], i32 noundef 5, i32 noundef 5)
+// CHECK-NEXT: br i1 [[CALL]], label %[[ATOMIC_EXIT:.*]], label %[[ATOMIC_CONT]]
+// CHECK: [[ATOMIC_EXIT]]:
+// CHECK-NEXT: store i256 [[TMP4]], ptr [[AGG_RESULT]], align 8
+// CHECK-NEXT: [[TMP5:%.*]] = load i256, ptr [[AGG_RESULT]], align 8
// CHECK-NEXT: store i256 [[TMP5]], ptr [[AGG_RESULT]], align 8
-// CHECK-NEXT: [[TMP7:%.*]] = load i256, ptr [[AGG_RESULT]], align 8
-// CHECK-NEXT: store i256 [[TMP7]], ptr [[AGG_RESULT]], align 8
// CHECK-NEXT: ret void
//
S256 add256(_Atomic(S256) *p, S256 v) {
@@ -347,7 +347,7 @@ S256 add256(_Atomic(S256) *p, S256 v) {
// Wide bitwise also needs the loop: the wide path has no inline atomicrmw.
// CHECK-LABEL: define dso_local void @or256(
// CHECK-SAME: ptr dead_on_unwind noalias writable sret(i256) align 8 [[AGG_RESULT:%.*]], ptr noundef [[P:%.*]], ptr noundef byval(i256) align 8 [[TMP0:%.*]]) #[[ATTR0]] {
-// CHECK-NEXT: [[ENTRY:.*]]:
+// CHECK-NEXT: [[ENTRY:.*:]]
// CHECK-NEXT: [[P_ADDR:%.*]] = alloca ptr, align 8
// CHECK-NEXT: [[V_ADDR:%.*]] = alloca i256, align 8
// CHECK-NEXT: [[DOTATOMICTMP:%.*]] = alloca i256, align 8
@@ -361,21 +361,19 @@ S256 add256(_Atomic(S256) *p, S256 v) {
// CHECK-NEXT: [[TMP2:%.*]] = load i256, ptr [[V_ADDR]], align 8
// CHECK-NEXT: store i256 [[TMP2]], ptr [[DOTATOMICTMP]], align 8
// CHECK-NEXT: [[TMP3:%.*]] = load i256, ptr [[DOTATOMICTMP]], align 8
-// CHECK-NEXT: call void @__atomic_load(i64 noundef 32, ptr noundef [[TMP1]], ptr noundef [[ATOMIC_TEMP]], i32 noundef 0)
+// CHECK-NEXT: call void @__atomic_load(i64 noundef 32, ptr noundef [[TMP1]], ptr noundef [[ATOMIC_TEMP]], i32 noundef 5)
+// CHECK-NEXT: br label %[[ATOMIC_CONT:.*]]
+// CHECK: [[ATOMIC_CONT]]:
// CHECK-NEXT: [[TMP4:%.*]] = load i256, ptr [[ATOMIC_TEMP]], align 8
-// CHECK-NEXT: br label %[[ATOMICRMW_START:.*]]
-// CHECK: [[ATOMICRMW_START]]:
-// CHECK-NEXT: [[TMP5:%.*]] = phi i256 [ [[TMP4]], %[[ENTRY]] ], [ [[TMP6:%.*]], %[[ATOMICRMW_START]] ]
-// CHECK-NEXT: [[NEW:%.*]] = or i256 [[TMP5]], [[TMP3]]
-// CHECK-NEXT: store i256 [[TMP5]], ptr [[ATOMIC_TEMP1]], align 8
+// CHECK-NEXT: [[NEW:%.*]] = or i256 [[TMP4]], [[TMP3]]
// CHECK-NEXT: store i256 [[NEW]], ptr [[ATOMIC_TEMP2]], align 8
-// CHECK-NEXT: [[CALL:%.*]] = call zeroext i1 @__atomic_compare_exchange(i64 noundef 32, ptr noundef [[TMP1]], ptr noundef [[ATOMIC_TEMP1]], ptr noundef [[ATOMIC_TEMP2]], i32 noundef 5, i32 noundef 5)
-// CHECK-NEXT: [[TMP6]] = load i256, ptr [[ATOMIC_TEMP1]], align 8
-// CHECK-NEXT: br i1 [[CALL]], label %[[ATOMICRMW_END:.*]], label %[[ATOMICRMW_START]]
-// CHECK: [[ATOMICRMW_END]]:
+// CHECK-NEXT: call void @__atomic_store(i64 noundef 32, ptr noundef [[ATOMIC_TEMP1]], ptr noundef [[ATOMIC_TEMP2]], i32 noundef 5)
+// CHECK-NEXT: [[CALL:%.*]] = call zeroext i1 @__atomic_compare_exchange(i64 noundef 32, ptr noundef [[TMP1]], ptr noundef [[ATOMIC_TEMP]], ptr noundef [[ATOMIC_TEMP1]], i32 noundef 5, i32 noundef 5)
+// CHECK-NEXT: br i1 [[CALL]], label %[[ATOMIC_EXIT:.*]], label %[[ATOMIC_CONT]]
+// CHECK: [[ATOMIC_EXIT]]:
+// CHECK-NEXT: store i256 [[TMP4]], ptr [[AGG_RESULT]], align 8
+// CHECK-NEXT: [[TMP5:%.*]] = load i256, ptr [[AGG_RESULT]], align 8
// CHECK-NEXT: store i256 [[TMP5]], ptr [[AGG_RESULT]], align 8
-// CHECK-NEXT: [[TMP7:%.*]] = load i256, ptr [[AGG_RESULT]], align 8
-// CHECK-NEXT: store i256 [[TMP7]], ptr [[AGG_RESULT]], align 8
// CHECK-NEXT: ret void
//
S256 or256(_Atomic(S256) *p, S256 v) {
>From e79f0f4e3fbec4c77fcae7f42f2d9dc5f3e85c5a Mon Sep 17 00:00:00 2001
From: Xavier Roche <xavier.roche at algolia.com>
Date: Sat, 27 Jun 2026 09:34:49 +0200
Subject: [PATCH 7/9] [compiler-rt] Add runtime test for atomic _BitInt(N)
Single-threaded execution test for _Atomic(_BitInt(N)): per-op value
correctness on a padded inline width and on wide libcall widths, plus
dirty-padding convergence. An object with non-canonical padding (written
through a union) must not spin forever in the read-modify-write
compare-exchange loop. The IR-shape checks in
clang/test/CodeGen/atomic-bitint.c cannot witness non-termination.
Assisted-by: Claude (Anthropic)
Co-Authored-By: Claude Opus 4.6 <noreply at anthropic.com>
---
.../test/builtins/Unit/atomic_bitint_test.c | 91 +++++++++++++++++++
1 file changed, 91 insertions(+)
create mode 100644 compiler-rt/test/builtins/Unit/atomic_bitint_test.c
diff --git a/compiler-rt/test/builtins/Unit/atomic_bitint_test.c b/compiler-rt/test/builtins/Unit/atomic_bitint_test.c
new file mode 100644
index 0000000000000..33a745348a6f0
--- /dev/null
+++ b/compiler-rt/test/builtins/Unit/atomic_bitint_test.c
@@ -0,0 +1,91 @@
+// RUN: %clang_builtins -std=c23 %s %librt -o %t && %run %t
+// REQUIRES: librt_has_atomic
+//===-- atomic_bitint_test.c - Test atomic ops on _BitInt -----------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// Runtime checks for atomic read-modify-write on _BitInt(N). A padded width
+// (37) exercises the inline compare-exchange loop; a wide width (256) exercises
+// the __atomic_compare_exchange libcall loop. Each op is cross-checked against
+// the same operation done non-atomically, and the dirty-padding cases confirm
+// the loop converges (a re-canonicalized expected would spin forever).
+//
+//===----------------------------------------------------------------------===//
+
+#include <assert.h>
+#include <stdio.h>
+
+typedef signed _BitInt(37) S37;
+typedef unsigned _BitInt(37) U37;
+typedef signed _BitInt(256) S256; // no padding (exactly 32 bytes)
+typedef signed _BitInt(200) S200; // padded: 200 value bits in 32-byte storage
+
+// Each macro runs the atomic op and asserts the returned old value and the
+// resulting object both match the non-atomic computation at width N.
+#define CHECK_FETCH(T, init, op, rhs, expr) \
+ do { \
+ _Atomic(T) a = (init); \
+ T old = __c11_atomic_fetch_##op(&a, (rhs), __ATOMIC_SEQ_CST); \
+ assert(old == (T)(init)); \
+ assert((T)a == (T)(expr)); \
+ } while (0)
+
+static void test_ops(void) {
+ CHECK_FETCH(S37, 100, add, 5, 105);
+ CHECK_FETCH(S37, 100, sub, 40, 60);
+ CHECK_FETCH(S37, -3, add, 1, -2);
+ CHECK_FETCH(U37, 7, add, 9, 16);
+ CHECK_FETCH(S37, 0x15, and, 0x13, 0x11);
+ CHECK_FETCH(S37, 0x10, or, 5, 0x15);
+ CHECK_FETCH(S37, 0x1F, xor, 0x15, 0x0A);
+ CHECK_FETCH(S37, -5, min, -7, -7); // signed: -7 < -5
+ CHECK_FETCH(U37, 5, min, (U37)-1, 5); // unsigned: 5 < 2^37-1
+ CHECK_FETCH(S37, 3, max, 9, 9);
+ CHECK_FETCH(S37, 0x15, nand, 0x13, (S37) ~(0x15 & 0x13));
+ // Wide widths: the libcall loop (no padding, and padded).
+ CHECK_FETCH(S256, 100, add, 5, 105);
+ CHECK_FETCH(S256, 1, or, 0xFE, 0xFF);
+ CHECK_FETCH(S200, 100, add, 5, 105);
+}
+
+// Seed non-canonical padding through a union, then RMW. A loop that carried a
+// re-canonicalized expected would never match memory and hang here.
+static void test_dirty_padding(void) {
+ union {
+ _Atomic(S37) a;
+ unsigned long b;
+ } s;
+ s.b = ((unsigned long)1 << 40) | 5u; // value bits 5, padding bit 40 set
+ S37 old = __c11_atomic_fetch_add(&s.a, 1, __ATOMIC_SEQ_CST);
+ assert(old == 5 && (S37)s.a == 6);
+
+ union {
+ _Atomic(U37) a;
+ unsigned long b;
+ } u;
+ u.b = ((unsigned long)3 << 50) | 7u;
+ U37 uold = __c11_atomic_fetch_add(&u.a, 1, __ATOMIC_SEQ_CST);
+ assert(uold == 7 && (U37)u.a == 8);
+
+ // Wide padded width (libcall loop): _BitInt(200) has 56 padding bits in its
+ // 32-byte storage. Set the overlay at value level (endian-independent): low
+ // 200 bits = 5, a padding bit (240) dirtied.
+ union {
+ _Atomic(S200) a;
+ unsigned _BitInt(256) full;
+ } w;
+ w.full = (unsigned _BitInt(256))5 | ((unsigned _BitInt(256))0xAA << 240);
+ S200 wold = __c11_atomic_fetch_add(&w.a, 1, __ATOMIC_SEQ_CST);
+ assert(wold == 5 && (S200)w.a == 6);
+}
+
+int main(void) {
+ test_ops();
+ test_dirty_padding();
+ printf("PASS\n");
+ return 0;
+}
>From 88f329b57018f8646b65484f0d36e92c9e54fd27 Mon Sep 17 00:00:00 2001
From: Xavier Roche <xavier.roche at algolia.com>
Date: Sat, 27 Jun 2026 09:56:57 +0200
Subject: [PATCH 8/9] [Clang][test] Expand _BitInt atomic Sema and CodeGen
coverage
Sema: add reject cases (non-_Atomic pointer, wrong arity, atomic _BitInt
bit-field) so lifting the _BitInt rejection does not silently drop the
atomic-specific checks, plus an __atomic_add_fetch (returns-new) accept.
CodeGen: add an unsigned arithmetic RMW (zero-extended desired) and signed
max / unsigned min, exercising the zext path and the icmp sgt/ule predicates
the previous functions did not.
Assisted-by: Claude (Anthropic)
Co-Authored-By: Claude Opus 4.6 <noreply at anthropic.com>
---
clang/test/CodeGen/atomic-bitint.c | 140 +++++++++++++++++++++++++++++
clang/test/Sema/atomic-bitint.c | 15 +++-
2 files changed, 152 insertions(+), 3 deletions(-)
diff --git a/clang/test/CodeGen/atomic-bitint.c b/clang/test/CodeGen/atomic-bitint.c
index bc1e165fd90e3..dda8f644f3fec 100644
--- a/clang/test/CodeGen/atomic-bitint.c
+++ b/clang/test/CodeGen/atomic-bitint.c
@@ -262,6 +262,146 @@ U37 min37(_Atomic(S37) *p, S37 v) {
return __c11_atomic_fetch_min(p, v, __ATOMIC_SEQ_CST);
}
+// Unsigned arithmetic RMW: the desired is zero-extended, not sign-extended.
+// CHECK-LABEL: define dso_local i64 @uadd37(
+// CHECK-SAME: ptr noundef [[P:%.*]], i64 noundef [[V_COERCE:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT: [[ENTRY:.*]]:
+// CHECK-NEXT: [[RETVAL:%.*]] = alloca i37, align 8
+// CHECK-NEXT: [[V:%.*]] = alloca i64, align 8
+// CHECK-NEXT: [[P_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT: [[V_ADDR:%.*]] = alloca i64, align 8
+// CHECK-NEXT: [[DOTATOMICTMP:%.*]] = alloca i64, align 8
+// CHECK-NEXT: [[ATOMIC_TEMP:%.*]] = alloca i64, align 8
+// CHECK-NEXT: store i64 [[V_COERCE]], ptr [[V]], align 8
+// CHECK-NEXT: [[TMP0:%.*]] = load i64, ptr [[V]], align 8
+// CHECK-NEXT: [[V1:%.*]] = trunc i64 [[TMP0]] to i37
+// CHECK-NEXT: store ptr [[P]], ptr [[P_ADDR]], align 8
+// CHECK-NEXT: [[STOREDV:%.*]] = zext i37 [[V1]] to i64
+// CHECK-NEXT: store i64 [[STOREDV]], ptr [[V_ADDR]], align 8
+// CHECK-NEXT: [[TMP1:%.*]] = load ptr, ptr [[P_ADDR]], align 8
+// CHECK-NEXT: [[TMP2:%.*]] = load i64, ptr [[V_ADDR]], align 8
+// CHECK-NEXT: [[LOADEDV:%.*]] = trunc i64 [[TMP2]] to i37
+// CHECK-NEXT: [[STOREDV2:%.*]] = zext i37 [[LOADEDV]] to i64
+// CHECK-NEXT: store i64 [[STOREDV2]], ptr [[DOTATOMICTMP]], align 8
+// CHECK-NEXT: [[TMP3:%.*]] = load i64, ptr [[DOTATOMICTMP]], align 8
+// CHECK-NEXT: [[LOADEDV3:%.*]] = trunc i64 [[TMP3]] to i37
+// CHECK-NEXT: [[ATOMIC_LOAD:%.*]] = load atomic i64, ptr [[TMP1]] seq_cst, align 8
+// CHECK-NEXT: br label %[[ATOMIC_CONT:.*]]
+// CHECK: [[ATOMIC_CONT]]:
+// CHECK-NEXT: [[TMP4:%.*]] = phi i64 [ [[ATOMIC_LOAD]], %[[ENTRY]] ], [ [[TMP7:%.*]], %[[ATOMIC_CONT]] ]
+// CHECK-NEXT: [[LOADEDV4:%.*]] = trunc i64 [[TMP4]] to i37
+// CHECK-NEXT: [[NEW:%.*]] = add i37 [[LOADEDV4]], [[LOADEDV3]]
+// CHECK-NEXT: [[STOREDV5:%.*]] = zext i37 [[NEW]] to i64
+// CHECK-NEXT: store atomic i64 [[STOREDV5]], ptr [[ATOMIC_TEMP]] seq_cst, align 8
+// CHECK-NEXT: [[TMP5:%.*]] = load i64, ptr [[ATOMIC_TEMP]], align 8
+// CHECK-NEXT: [[TMP6:%.*]] = cmpxchg ptr [[TMP1]], i64 [[TMP4]], i64 [[TMP5]] seq_cst seq_cst, align 8
+// CHECK-NEXT: [[TMP7]] = extractvalue { i64, i1 } [[TMP6]], 0
+// CHECK-NEXT: [[TMP8:%.*]] = extractvalue { i64, i1 } [[TMP6]], 1
+// CHECK-NEXT: br i1 [[TMP8]], label %[[ATOMIC_EXIT:.*]], label %[[ATOMIC_CONT]]
+// CHECK: [[ATOMIC_EXIT]]:
+// CHECK-NEXT: store i37 [[LOADEDV4]], ptr [[RETVAL]], align 8
+// CHECK-NEXT: [[TMP9:%.*]] = load i37, ptr [[RETVAL]], align 8
+// CHECK-NEXT: [[COERCE_VAL_II:%.*]] = zext i37 [[TMP9]] to i64
+// CHECK-NEXT: ret i64 [[COERCE_VAL_II]]
+//
+U37 uadd37(_Atomic(U37) *p, U37 v) {
+ return __c11_atomic_fetch_add(p, v, __ATOMIC_SEQ_CST);
+}
+
+// Signed max computes at the value width with a signed compare.
+// CHECK-LABEL: define dso_local i64 @max37(
+// CHECK-SAME: ptr noundef [[P:%.*]], i64 noundef [[V_COERCE:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT: [[ENTRY:.*]]:
+// CHECK-NEXT: [[RETVAL:%.*]] = alloca i37, align 8
+// CHECK-NEXT: [[V:%.*]] = alloca i64, align 8
+// CHECK-NEXT: [[P_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT: [[V_ADDR:%.*]] = alloca i64, align 8
+// CHECK-NEXT: [[DOTATOMICTMP:%.*]] = alloca i64, align 8
+// CHECK-NEXT: [[ATOMIC_TEMP:%.*]] = alloca i64, align 8
+// CHECK-NEXT: store i64 [[V_COERCE]], ptr [[V]], align 8
+// CHECK-NEXT: [[TMP0:%.*]] = load i64, ptr [[V]], align 8
+// CHECK-NEXT: [[V1:%.*]] = trunc i64 [[TMP0]] to i37
+// CHECK-NEXT: store ptr [[P]], ptr [[P_ADDR]], align 8
+// CHECK-NEXT: [[STOREDV:%.*]] = sext i37 [[V1]] to i64
+// CHECK-NEXT: store i64 [[STOREDV]], ptr [[V_ADDR]], align 8
+// CHECK-NEXT: [[TMP1:%.*]] = load ptr, ptr [[P_ADDR]], align 8
+// CHECK-NEXT: [[TMP2:%.*]] = load i64, ptr [[V_ADDR]], align 8
+// CHECK-NEXT: [[LOADEDV:%.*]] = trunc i64 [[TMP2]] to i37
+// CHECK-NEXT: [[STOREDV2:%.*]] = sext i37 [[LOADEDV]] to i64
+// CHECK-NEXT: store i64 [[STOREDV2]], ptr [[DOTATOMICTMP]], align 8
+// CHECK-NEXT: [[TMP3:%.*]] = load i64, ptr [[DOTATOMICTMP]], align 8
+// CHECK-NEXT: [[LOADEDV3:%.*]] = trunc i64 [[TMP3]] to i37
+// CHECK-NEXT: [[ATOMIC_LOAD:%.*]] = load atomic i64, ptr [[TMP1]] seq_cst, align 8
+// CHECK-NEXT: br label %[[ATOMIC_CONT:.*]]
+// CHECK: [[ATOMIC_CONT]]:
+// CHECK-NEXT: [[TMP4:%.*]] = phi i64 [ [[ATOMIC_LOAD]], %[[ENTRY]] ], [ [[TMP8:%.*]], %[[ATOMIC_CONT]] ]
+// CHECK-NEXT: [[LOADEDV4:%.*]] = trunc i64 [[TMP4]] to i37
+// CHECK-NEXT: [[TMP5:%.*]] = icmp sgt i37 [[LOADEDV4]], [[LOADEDV3]]
+// CHECK-NEXT: [[NEW:%.*]] = select i1 [[TMP5]], i37 [[LOADEDV4]], i37 [[LOADEDV3]]
+// CHECK-NEXT: [[STOREDV5:%.*]] = sext i37 [[NEW]] to i64
+// CHECK-NEXT: store atomic i64 [[STOREDV5]], ptr [[ATOMIC_TEMP]] seq_cst, align 8
+// CHECK-NEXT: [[TMP6:%.*]] = load i64, ptr [[ATOMIC_TEMP]], align 8
+// CHECK-NEXT: [[TMP7:%.*]] = cmpxchg ptr [[TMP1]], i64 [[TMP4]], i64 [[TMP6]] seq_cst seq_cst, align 8
+// CHECK-NEXT: [[TMP8]] = extractvalue { i64, i1 } [[TMP7]], 0
+// CHECK-NEXT: [[TMP9:%.*]] = extractvalue { i64, i1 } [[TMP7]], 1
+// CHECK-NEXT: br i1 [[TMP9]], label %[[ATOMIC_EXIT:.*]], label %[[ATOMIC_CONT]]
+// CHECK: [[ATOMIC_EXIT]]:
+// CHECK-NEXT: store i37 [[LOADEDV4]], ptr [[RETVAL]], align 8
+// CHECK-NEXT: [[TMP10:%.*]] = load i37, ptr [[RETVAL]], align 8
+// CHECK-NEXT: [[COERCE_VAL_II:%.*]] = zext i37 [[TMP10]] to i64
+// CHECK-NEXT: ret i64 [[COERCE_VAL_II]]
+//
+S37 max37(_Atomic(S37) *p, S37 v) {
+ return __c11_atomic_fetch_max(p, v, __ATOMIC_SEQ_CST);
+}
+
+// Unsigned min computes at the value width with an unsigned compare.
+// CHECK-LABEL: define dso_local i64 @umin37(
+// CHECK-SAME: ptr noundef [[P:%.*]], i64 noundef [[V_COERCE:%.*]]) #[[ATTR0]] {
+// CHECK-NEXT: [[ENTRY:.*]]:
+// CHECK-NEXT: [[RETVAL:%.*]] = alloca i37, align 8
+// CHECK-NEXT: [[V:%.*]] = alloca i64, align 8
+// CHECK-NEXT: [[P_ADDR:%.*]] = alloca ptr, align 8
+// CHECK-NEXT: [[V_ADDR:%.*]] = alloca i64, align 8
+// CHECK-NEXT: [[DOTATOMICTMP:%.*]] = alloca i64, align 8
+// CHECK-NEXT: [[ATOMIC_TEMP:%.*]] = alloca i64, align 8
+// CHECK-NEXT: store i64 [[V_COERCE]], ptr [[V]], align 8
+// CHECK-NEXT: [[TMP0:%.*]] = load i64, ptr [[V]], align 8
+// CHECK-NEXT: [[V1:%.*]] = trunc i64 [[TMP0]] to i37
+// CHECK-NEXT: store ptr [[P]], ptr [[P_ADDR]], align 8
+// CHECK-NEXT: [[STOREDV:%.*]] = zext i37 [[V1]] to i64
+// CHECK-NEXT: store i64 [[STOREDV]], ptr [[V_ADDR]], align 8
+// CHECK-NEXT: [[TMP1:%.*]] = load ptr, ptr [[P_ADDR]], align 8
+// CHECK-NEXT: [[TMP2:%.*]] = load i64, ptr [[V_ADDR]], align 8
+// CHECK-NEXT: [[LOADEDV:%.*]] = trunc i64 [[TMP2]] to i37
+// CHECK-NEXT: [[STOREDV2:%.*]] = zext i37 [[LOADEDV]] to i64
+// CHECK-NEXT: store i64 [[STOREDV2]], ptr [[DOTATOMICTMP]], align 8
+// CHECK-NEXT: [[TMP3:%.*]] = load i64, ptr [[DOTATOMICTMP]], align 8
+// CHECK-NEXT: [[LOADEDV3:%.*]] = trunc i64 [[TMP3]] to i37
+// CHECK-NEXT: [[ATOMIC_LOAD:%.*]] = load atomic i64, ptr [[TMP1]] seq_cst, align 8
+// CHECK-NEXT: br label %[[ATOMIC_CONT:.*]]
+// CHECK: [[ATOMIC_CONT]]:
+// CHECK-NEXT: [[TMP4:%.*]] = phi i64 [ [[ATOMIC_LOAD]], %[[ENTRY]] ], [ [[TMP8:%.*]], %[[ATOMIC_CONT]] ]
+// CHECK-NEXT: [[LOADEDV4:%.*]] = trunc i64 [[TMP4]] to i37
+// CHECK-NEXT: [[TMP5:%.*]] = icmp ule i37 [[LOADEDV4]], [[LOADEDV3]]
+// CHECK-NEXT: [[NEW:%.*]] = select i1 [[TMP5]], i37 [[LOADEDV4]], i37 [[LOADEDV3]]
+// CHECK-NEXT: [[STOREDV5:%.*]] = zext i37 [[NEW]] to i64
+// CHECK-NEXT: store atomic i64 [[STOREDV5]], ptr [[ATOMIC_TEMP]] seq_cst, align 8
+// CHECK-NEXT: [[TMP6:%.*]] = load i64, ptr [[ATOMIC_TEMP]], align 8
+// CHECK-NEXT: [[TMP7:%.*]] = cmpxchg ptr [[TMP1]], i64 [[TMP4]], i64 [[TMP6]] seq_cst seq_cst, align 8
+// CHECK-NEXT: [[TMP8]] = extractvalue { i64, i1 } [[TMP7]], 0
+// CHECK-NEXT: [[TMP9:%.*]] = extractvalue { i64, i1 } [[TMP7]], 1
+// CHECK-NEXT: br i1 [[TMP9]], label %[[ATOMIC_EXIT:.*]], label %[[ATOMIC_CONT]]
+// CHECK: [[ATOMIC_EXIT]]:
+// CHECK-NEXT: store i37 [[LOADEDV4]], ptr [[RETVAL]], align 8
+// CHECK-NEXT: [[TMP10:%.*]] = load i37, ptr [[RETVAL]], align 8
+// CHECK-NEXT: [[COERCE_VAL_II:%.*]] = zext i37 [[TMP10]] to i64
+// CHECK-NEXT: ret i64 [[COERCE_VAL_II]]
+//
+U37 umin37(_Atomic(U37) *p, U37 v) {
+ return __c11_atomic_fetch_min(p, v, __ATOMIC_SEQ_CST);
+}
+
// No padding: direct atomicrmw, no loop.
// CHECK-LABEL: define dso_local i64 @add64(
// CHECK-SAME: ptr noundef [[P:%.*]], i64 noundef [[V:%.*]]) #[[ATTR0]] {
diff --git a/clang/test/Sema/atomic-bitint.c b/clang/test/Sema/atomic-bitint.c
index fbb4c518438fb..3bd4faf22e7be 100644
--- a/clang/test/Sema/atomic-bitint.c
+++ b/clang/test/Sema/atomic-bitint.c
@@ -7,8 +7,6 @@
// code imposes no width cap of its own; widths past 128 are available wherever
// the target accepts _BitInt > 128 (x86 and RISC-V today).
-// expected-no-diagnostics
-
_Atomic(_BitInt(4)) a4; // small
_Atomic(_BitInt(9)) a9; // non-power-of-two
_Atomic(_BitInt(37)) a37; // padded
@@ -35,9 +33,20 @@ void c11_builtins(_Atomic(_BitInt(37)) *p, _BitInt(37) v, _BitInt(37) *e) {
(void)__c11_atomic_fetch_min(p, v, __ATOMIC_SEQ_CST);
}
-// The GNU __atomic_* builtins take a plain _BitInt pointer.
+// The GNU __atomic_* builtins take a plain _BitInt pointer; the _fetch forms
+// return the new value.
void gnu_builtins(_BitInt(37) *p, _BitInt(37) v) {
(void)__atomic_load_n(p, __ATOMIC_SEQ_CST);
__atomic_store_n(p, v, __ATOMIC_SEQ_CST);
(void)__atomic_fetch_add(p, v, __ATOMIC_SEQ_CST);
+ (void)__atomic_add_fetch(p, v, __ATOMIC_SEQ_CST);
+}
+
+// Lifting the _BitInt rejection must not lose the atomic-specific checks.
+void rejects(_Atomic(_BitInt(37)) *ap, _BitInt(37) *p, _BitInt(37) v) {
+ (void)__c11_atomic_load(ap); // expected-error {{too few arguments to function call}}
+ (void)__c11_atomic_fetch_add(p, v, __ATOMIC_SEQ_CST); // expected-error {{must be a pointer to _Atomic}}
}
+struct WithAtomicBitIntField {
+ _Atomic(_BitInt(5)) f : 3; // expected-error {{bit-field 'f' has non-integral type}}
+};
>From 9d349acf9dc9f5c74c4103c3ed60252ca95ff53e Mon Sep 17 00:00:00 2001
From: Xavier Roche <xavier.roche at algolia.com>
Date: Sat, 27 Jun 2026 09:56:58 +0200
Subject: [PATCH 9/9] [compiler-rt] Harden the _BitInt atomic runtime test
Use uint64_t for the dirty-padding overlay (unsigned long is 32-bit on
LLP64, where the padding-bit shift was undefined). Read the storage back
after a converged RMW to confirm the padding is canonicalized, and add
returns-new (__atomic_*_fetch) and non-seq_cst ordering coverage.
Assisted-by: Claude (Anthropic)
Co-Authored-By: Claude Opus 4.6 <noreply at anthropic.com>
---
.../test/builtins/Unit/atomic_bitint_test.c | 37 +++++++++++++++++--
1 file changed, 33 insertions(+), 4 deletions(-)
diff --git a/compiler-rt/test/builtins/Unit/atomic_bitint_test.c b/compiler-rt/test/builtins/Unit/atomic_bitint_test.c
index 33a745348a6f0..e0cc3aef61bc2 100644
--- a/compiler-rt/test/builtins/Unit/atomic_bitint_test.c
+++ b/compiler-rt/test/builtins/Unit/atomic_bitint_test.c
@@ -17,6 +17,7 @@
//===----------------------------------------------------------------------===//
#include <assert.h>
+#include <stdint.h>
#include <stdio.h>
typedef signed _BitInt(37) S37;
@@ -55,21 +56,25 @@ static void test_ops(void) {
// Seed non-canonical padding through a union, then RMW. A loop that carried a
// re-canonicalized expected would never match memory and hang here.
static void test_dirty_padding(void) {
+ // uint64_t (not unsigned long, which is 32-bit on LLP64) so the padding bit
+ // is representable and the overlay matches the 8-byte atomic.
union {
_Atomic(S37) a;
- unsigned long b;
+ uint64_t b;
} s;
- s.b = ((unsigned long)1 << 40) | 5u; // value bits 5, padding bit 40 set
+ s.b = ((uint64_t)1 << 40) | 5u; // value bits 5, padding bit 40 set
S37 old = __c11_atomic_fetch_add(&s.a, 1, __ATOMIC_SEQ_CST);
assert(old == 5 && (S37)s.a == 6);
+ assert((s.b >> 37) == 0); // padding canonicalized (positive value)
union {
_Atomic(U37) a;
- unsigned long b;
+ uint64_t b;
} u;
- u.b = ((unsigned long)3 << 50) | 7u;
+ u.b = ((uint64_t)3 << 50) | 7u;
U37 uold = __c11_atomic_fetch_add(&u.a, 1, __ATOMIC_SEQ_CST);
assert(uold == 7 && (U37)u.a == 8);
+ assert((u.b >> 37) == 0); // padding canonicalized (zero-extended)
// Wide padded width (libcall loop): _BitInt(200) has 56 padding bits in its
// 32-byte storage. Set the overlay at value level (endian-independent): low
@@ -81,11 +86,35 @@ static void test_dirty_padding(void) {
w.full = (unsigned _BitInt(256))5 | ((unsigned _BitInt(256))0xAA << 240);
S200 wold = __c11_atomic_fetch_add(&w.a, 1, __ATOMIC_SEQ_CST);
assert(wold == 5 && (S200)w.a == 6);
+ assert((w.full >> 200) == 0); // padding canonicalized (positive value)
+}
+
+// The _fetch builtins return the new value, not the old one.
+static void test_returns_new(void) {
+ S37 a = 100;
+ assert(__atomic_add_fetch(&a, 5, __ATOMIC_SEQ_CST) == 105);
+ assert(__atomic_sub_fetch(&a, 10, __ATOMIC_SEQ_CST) == 95);
+ U37 u = 0;
+ assert(__atomic_or_fetch(&u, 0xF, __ATOMIC_SEQ_CST) == 0xF);
+ S200 w = 100;
+ assert(__atomic_add_fetch(&w, 5, __ATOMIC_SEQ_CST) == 105);
+}
+
+// Each non-seq_cst ordering drives the loop's load/cmpxchg ordering.
+static void test_orderings(void) {
+ _Atomic(S37) a = 10;
+ (void)__c11_atomic_fetch_add(&a, 1, __ATOMIC_RELAXED);
+ (void)__c11_atomic_fetch_add(&a, 1, __ATOMIC_ACQUIRE);
+ (void)__c11_atomic_fetch_add(&a, 1, __ATOMIC_RELEASE);
+ (void)__c11_atomic_fetch_add(&a, 1, __ATOMIC_ACQ_REL);
+ assert((S37)a == 14);
}
int main(void) {
test_ops();
test_dirty_padding();
+ test_returns_new();
+ test_orderings();
printf("PASS\n");
return 0;
}
More information about the cfe-commits
mailing list