[libcxx-commits] [libcxx] [libc++] std::byteswap support for _BitInt(N) (PR #196512)

Sun May 10 03:14:50 PDT 2026

https://github.com/xroche updated https://github.com/llvm/llvm-project/pull/196512

>From 4f6192b4d5fa674e35f85bc913c81f5ceb9f86ab Mon Sep 17 00:00:00 2001
From: Xavier Roche <xavier.roche at algolia.com>
Date: Fri, 8 May 2026 13:15:13 +0200
Subject: [PATCH 1/5] [libc++] Add generic byteswap for wide integral types

Replace the static_assert(sizeof(_Tp) == 0) failure for sizeof > 16 with
a byte-reversal loop. std::byteswap now works for _BitInt(N) with N > 128
and any future wider integer type. The loop reads the value 8 bits at a
time via right-shift + cast through unsigned char, and writes the bytes
back in reverse order. Left-shift and bitwise-OR on signed integral types
are well-defined modulo 2^width in C++23.

For sizeof 1, 2, 4, 8, and 16 the existing __builtin_bswap* paths are
unchanged.

Part of the [_BitInt(N) libc++ effort](https://discourse.llvm.org/t/bitint-n-support-in-libc-investigations-possible-improvements-looking-for-guidance/90063).

Assisted-by: Claude (Anthropic)
Co-Authored-By: Claude Opus 4.6 <noreply at anthropic.com>
---
 libcxx/include/__bit/byteswap.h                  | 13 ++++++++++++-
 libcxx/test/libcxx/transitive_includes/cxx23.csv |  1 +
 libcxx/test/libcxx/transitive_includes/cxx26.csv |  1 +
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/libcxx/include/__bit/byteswap.h b/libcxx/include/__bit/byteswap.h
index 7ce7e069b4142..66a99713bd0dd 100644
--- a/libcxx/include/__bit/byteswap.h
+++ b/libcxx/include/__bit/byteswap.h
@@ -12,6 +12,8 @@
 
 #include <__concepts/arithmetic.h>
 #include <__config>
+#include <__cstddef/size_t.h>
+#include <climits>
 #include <cstdint>
 
 #if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
@@ -42,7 +44,16 @@ template <integral _Tp>
 #    endif // __has_builtin(__builtin_bswap128)
 #  endif   // _LIBCPP_HAS_INT128
   } else {
-    static_assert(sizeof(_Tp) == 0, "byteswap is unimplemented for integral types of this size");
+    // Generic byte-reversal for wide integer types (e.g. _BitInt(N) with
+    // N > 128). Reads the value 8 bits at a time and writes the bytes
+    // back in reverse order. Left-shift on signed integral types is
+    // well-defined modulo 2^width since C++20.
+    _Tp __result = 0;
+    for (size_t __i = 0; __i < sizeof(_Tp); ++__i) {
+      __result |= static_cast<_Tp>(static_cast<unsigned char>(__val >> (__i * CHAR_BIT)))
+               << ((sizeof(_Tp) - 1 - __i) * CHAR_BIT);
+    }
+    return __result;
   }
 }
 
diff --git a/libcxx/test/libcxx/transitive_includes/cxx23.csv b/libcxx/test/libcxx/transitive_includes/cxx23.csv
index c5cc61f06678c..073f698786117 100644
--- a/libcxx/test/libcxx/transitive_includes/cxx23.csv
+++ b/libcxx/test/libcxx/transitive_includes/cxx23.csv
@@ -42,6 +42,7 @@ barrier ctime
 barrier limits
 barrier ratio
 barrier version
+bit climits
 bit cstdint
 bit limits
 bit version
diff --git a/libcxx/test/libcxx/transitive_includes/cxx26.csv b/libcxx/test/libcxx/transitive_includes/cxx26.csv
index 253cf64703076..5b4ba7918ae96 100644
--- a/libcxx/test/libcxx/transitive_includes/cxx26.csv
+++ b/libcxx/test/libcxx/transitive_includes/cxx26.csv
@@ -40,6 +40,7 @@ barrier ctime
 barrier limits
 barrier ratio
 barrier version
+bit climits
 bit cstdint
 bit limits
 bit version

>From 42721d3b8f7a0937aeb2cf87d1f58c3266df26e8 Mon Sep 17 00:00:00 2001
From: Xavier Roche <xavier.roche at algolia.com>
Date: Fri, 8 May 2026 13:15:44 +0200
Subject: [PATCH 2/5] [libc++] Reject byteswap of types with padding bits

For _BitInt(N) where N is not a multiple of CHAR_BIT, byteswap of the
storage representation moves padding bits into significant positions and
produces a value whose meaning is unspecified. Reject those cases with a
static_assert that fires when the type's value bits do not fill the
entire object representation.

The size-1 case is exempt from the check because no bytes move and no
padding gets relocated. bool, _BitInt(7), and similar types stay
identity.

Suggested by philnik in the libc++ _BitInt review thread ("byteswap
with padding: should definitely be rejected").

Assisted-by: Claude (Anthropic)
Co-Authored-By: Claude Opus 4.6 <noreply at anthropic.com>
---
 libcxx/include/__bit/byteswap.h | 56 +++++++++++++++++++++------------
 1 file changed, 36 insertions(+), 20 deletions(-)

diff --git a/libcxx/include/__bit/byteswap.h b/libcxx/include/__bit/byteswap.h
index 66a99713bd0dd..2754265080a34 100644
--- a/libcxx/include/__bit/byteswap.h
+++ b/libcxx/include/__bit/byteswap.h
@@ -15,6 +15,7 @@
 #include <__cstddef/size_t.h>
 #include <climits>
 #include <cstdint>
+#include <limits>
 
 #if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
 #  pragma GCC system_header
@@ -27,33 +28,48 @@ _LIBCPP_BEGIN_NAMESPACE_STD
 template <integral _Tp>
 [[nodiscard]] _LIBCPP_HIDE_FROM_ABI constexpr _Tp byteswap(_Tp __val) noexcept {
   if constexpr (sizeof(_Tp) == 1) {
+    // Identity for size-1 types: no bytes move and no padding gets shuffled
+    // into significant positions. bool, char, and _BitInt(N <= CHAR_BIT)
+    // all land here.
     return __val;
-  } else if constexpr (sizeof(_Tp) == 2) {
-    return __builtin_bswap16(__val);
-  } else if constexpr (sizeof(_Tp) == 4) {
-    return __builtin_bswap32(__val);
-  } else if constexpr (sizeof(_Tp) == 8) {
-    return __builtin_bswap64(__val);
+  } else {
+    // Reject types whose value bits do not fill the entire object
+    // representation (e.g. _BitInt(13) has 3 padding bits in 2 bytes of
+    // storage). The byte-level builtins below would swap those padding
+    // bits into significant positions, and the resulting value's meaning
+    // is unspecified. The size-1 case above is exempt because no bytes
+    // move.
+    static_assert(numeric_limits<_Tp>::digits + numeric_limits<_Tp>::is_signed == sizeof(_Tp) * CHAR_BIT,
+                  "std::byteswap requires a type whose value bits fill the entire "
+                  "object representation; types like _BitInt(N) where N is not a "
+                  "multiple of CHAR_BIT have padding bits and are rejected");
+    if constexpr (sizeof(_Tp) == 2) {
+      return __builtin_bswap16(__val);
+    } else if constexpr (sizeof(_Tp) == 4) {
+      return __builtin_bswap32(__val);
+    } else if constexpr (sizeof(_Tp) == 8) {
+      return __builtin_bswap64(__val);
 #  if _LIBCPP_HAS_INT128
-  } else if constexpr (sizeof(_Tp) == 16) {
+    } else if constexpr (sizeof(_Tp) == 16) {
 #    if __has_builtin(__builtin_bswap128)
-    return __builtin_bswap128(__val);
+      return __builtin_bswap128(__val);
 #    else
-    return (static_cast<_Tp>(byteswap(static_cast<uint64_t>(__val))) << 64) |
-           static_cast<_Tp>(byteswap(static_cast<uint64_t>(__val >> 64)));
+      return (static_cast<_Tp>(byteswap(static_cast<uint64_t>(__val))) << 64) |
+             static_cast<_Tp>(byteswap(static_cast<uint64_t>(__val >> 64)));
 #    endif // __has_builtin(__builtin_bswap128)
 #  endif   // _LIBCPP_HAS_INT128
-  } else {
-    // Generic byte-reversal for wide integer types (e.g. _BitInt(N) with
-    // N > 128). Reads the value 8 bits at a time and writes the bytes
-    // back in reverse order. Left-shift on signed integral types is
-    // well-defined modulo 2^width since C++20.
-    _Tp __result = 0;
-    for (size_t __i = 0; __i < sizeof(_Tp); ++__i) {
-      __result |= static_cast<_Tp>(static_cast<unsigned char>(__val >> (__i * CHAR_BIT)))
-               << ((sizeof(_Tp) - 1 - __i) * CHAR_BIT);
+    } else {
+      // Generic byte-reversal for wide integer types (e.g. _BitInt(N) with
+      // N > 128). Reads the value 8 bits at a time and writes the bytes
+      // back in reverse order. Left-shift on signed integral types is
+      // well-defined modulo 2^width since C++20.
+      _Tp __result = 0;
+      for (size_t __i = 0; __i < sizeof(_Tp); ++__i) {
+        __result |= static_cast<_Tp>(static_cast<unsigned char>(__val >> (__i * CHAR_BIT)))
+                 << ((sizeof(_Tp) - 1 - __i) * CHAR_BIT);
+      }
+      return __result;
     }
-    return __result;
   }
 }
 

>From acf2941190cabc38718b054efce12caa330b8ce5 Mon Sep 17 00:00:00 2001
From: Xavier Roche <xavier.roche at algolia.com>
Date: Fri, 8 May 2026 13:16:06 +0200
Subject: [PATCH 3/5] [libc++][test] Add std::byteswap tests for _BitInt(N)

Two test files for the changes in the prior two commits.

byteswap.bitint.pass.cpp covers byte-aligned _BitInt(N) widths from 8 up
to __BITINT_MAXWIDTH__: builtins for sizeof <= 16, the new generic loop
for sizeof > 16. The size-1 case includes _BitInt(7) (identity, despite
the padding bit) so the asymmetry with _BitInt(13) is pinned. The wide
case has a ramp pattern with a distinct byte at every position, which
surfaces any off-by-one in the loop indexing, plus low-byte and high-byte
spot checks. Wide-width tests gate on __BITINT_MAXWIDTH__ so non-x86 /
non-RISC-V64 targets compile a smaller subset.

byteswap.bitint.verify.cpp pins the static_assert added by the previous
commit. Covers _BitInt(13), _BitInt(17), _BitInt(33), _BitInt(65) on all
targets, plus _BitInt(129) and _BitInt(255) where the platform supports
them. The expected-error-re pattern matches both Clang's "static
assertion" and GCC's "static_assert" wordings.

Assisted-by: Claude (Anthropic)
Co-Authored-By: Claude Opus 4.6 <noreply at anthropic.com>
---
 .../std/numerics/bit/byteswap.bitint.pass.cpp | 131 ++++++++++++++++++
 .../numerics/bit/byteswap.bitint.verify.cpp   |  78 +++++++++++
 2 files changed, 209 insertions(+)
 create mode 100644 libcxx/test/std/numerics/bit/byteswap.bitint.pass.cpp
 create mode 100644 libcxx/test/std/numerics/bit/byteswap.bitint.verify.cpp

diff --git a/libcxx/test/std/numerics/bit/byteswap.bitint.pass.cpp b/libcxx/test/std/numerics/bit/byteswap.bitint.pass.cpp
new file mode 100644
index 0000000000000..ce432c1432c15
--- /dev/null
+++ b/libcxx/test/std/numerics/bit/byteswap.bitint.pass.cpp
@@ -0,0 +1,131 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+// UNSUPPORTED: c++03, c++11, c++14, c++17, c++20
+
+// <bit>
+
+// std::byteswap for _BitInt(N).
+//
+// Byte-aligned widths (N % CHAR_BIT == 0) work via the existing builtins
+// for sizeof <= 16 and via the new generic loop for sizeof > 16. Non-byte-
+// aligned widths are rejected by static_assert; that case is covered in
+// byteswap.bitint.verify.cpp.
+
+#include <bit>
+#include <cassert>
+#include <cstdint>
+
+#include "test_macros.h"
+
+#if TEST_HAS_EXTENSION(bit_int)
+
+template <class T>
+constexpr void test_roundtrip(T v) {
+  assert(std::byteswap(std::byteswap(v)) == v);
+  ASSERT_SAME_TYPE(decltype(std::byteswap(v)), T);
+  ASSERT_NOEXCEPT(std::byteswap(v));
+}
+
+constexpr bool test() {
+  // sizeof == 1: identity. The size-1 branch returns the input unchanged
+  // and the padding-bit static_assert is bypassed -- no bytes move and no
+  // padding gets shuffled into significant positions, so non-byte-aligned
+  // widths up to CHAR_BIT (e.g. _BitInt(7)) are also identity.
+  assert(std::byteswap(static_cast<unsigned _BitInt(8)>(0xAB)) == static_cast<unsigned _BitInt(8)>(0xAB));
+  test_roundtrip<unsigned _BitInt(8)>(0xAB);
+  test_roundtrip<signed _BitInt(8)>(0x12);
+  // _BitInt(7) signed has a padding bit but stays identity at sizeof == 1.
+  assert(std::byteswap(static_cast<signed _BitInt(7)>(42)) == static_cast<signed _BitInt(7)>(42));
+  assert(std::byteswap(static_cast<unsigned _BitInt(7)>(42)) == static_cast<unsigned _BitInt(7)>(42));
+
+  // sizeof == 2: __builtin_bswap16
+  assert(std::byteswap(static_cast<unsigned _BitInt(16)>(0xCDEF)) == static_cast<unsigned _BitInt(16)>(0xEFCD));
+  test_roundtrip<unsigned _BitInt(16)>(0xCDEF);
+  test_roundtrip<signed _BitInt(16)>(0x1234);
+
+  // sizeof == 4: __builtin_bswap32
+  assert(std::byteswap(static_cast<unsigned _BitInt(32)>(0x01234567U)) ==
+         static_cast<unsigned _BitInt(32)>(0x67452301U));
+  test_roundtrip<unsigned _BitInt(32)>(0x01234567U);
+  test_roundtrip<signed _BitInt(32)>(0x01234567);
+
+  // sizeof == 8: __builtin_bswap64
+  assert(std::byteswap(static_cast<unsigned _BitInt(64)>(0x0123456789ABCDEFULL)) ==
+         static_cast<unsigned _BitInt(64)>(0xEFCDAB8967452301ULL));
+  test_roundtrip<unsigned _BitInt(64)>(0x0123456789ABCDEFULL);
+  test_roundtrip<signed _BitInt(64)>(0x0123456789ABCDEFLL);
+
+#  if __BITINT_MAXWIDTH__ >= 128
+  // sizeof == 16: __builtin_bswap128 (or 2x bswap64 fallback). Same path
+  // as the existing __int128_t / __uint128_t coverage in byteswap.pass.cpp.
+  unsigned _BitInt(128) v128 =
+      (static_cast<unsigned _BitInt(128)>(0x0123456789ABCDEFULL) << 64) |
+      static_cast<unsigned _BitInt(128)>(0x13579BDF02468ACEULL);
+  test_roundtrip<unsigned _BitInt(128)>(v128);
+  test_roundtrip<signed _BitInt(128)>(static_cast<signed _BitInt(128)>(v128));
+#  endif
+
+#  if __BITINT_MAXWIDTH__ >= 256
+  // sizeof == 32: hits the new generic loop fallback.
+  unsigned _BitInt(256) v256 =
+      (static_cast<unsigned _BitInt(256)>(0xDEADBEEFCAFEBABEULL) << 128) |
+      (static_cast<unsigned _BitInt(256)>(0x1234567890ABCDEFULL) << 64) |
+      static_cast<unsigned _BitInt(256)>(0xFEDCBA9876543210ULL);
+  test_roundtrip<unsigned _BitInt(256)>(v256);
+  test_roundtrip<signed _BitInt(256)>(static_cast<signed _BitInt(256)>(v256));
+
+  // Spot check for the wide loop: low byte of input must end up as the
+  // high byte of the output, and high byte of input as the low byte.
+  unsigned _BitInt(256) lo_only = 0xAB;
+  auto lo_swapped               = std::byteswap(lo_only);
+  assert(static_cast<unsigned char>(lo_swapped >> ((sizeof(lo_swapped) - 1) * CHAR_BIT)) == 0xAB);
+  unsigned _BitInt(256) hi_only =
+      static_cast<unsigned _BitInt(256)>(0xCD) << ((sizeof(unsigned _BitInt(256)) - 1) * CHAR_BIT);
+  auto hi_swapped = std::byteswap(hi_only);
+  assert(static_cast<unsigned char>(hi_swapped) == 0xCD);
+
+  // Mid-value test: distinct byte at every position so an off-by-one in
+  // the loop indexing surfaces directly. Build 0x00010203...1F at bytes
+  // 0..31 then verify byteswap reverses the byte sequence.
+  unsigned _BitInt(256) ramp = 0;
+  for (int __i = 0; __i < 32; ++__i)
+    ramp |= static_cast<unsigned _BitInt(256)>(__i) << (__i * CHAR_BIT);
+  auto ramp_swapped = std::byteswap(ramp);
+  for (int __i = 0; __i < 32; ++__i)
+    assert(static_cast<unsigned char>(ramp_swapped >> ((31 - __i) * CHAR_BIT)) == __i);
+#  endif
+
+#  if __BITINT_MAXWIDTH__ >= 1024
+  // Larger width still in the generic loop.
+  unsigned _BitInt(1024) v1024 = static_cast<unsigned _BitInt(1024)>(0xAB) << ((128 - 1) * CHAR_BIT);
+  test_roundtrip<unsigned _BitInt(1024)>(v1024);
+#  endif
+
+#  if __BITINT_MAXWIDTH__ >= 4096
+  // Largest width tested. Picked to cover the upper end of what the
+  // dev-branch experiments exercised; values larger than this take a
+  // long time to constexpr-evaluate without adding much coverage.
+  unsigned _BitInt(4096) v4096 = static_cast<unsigned _BitInt(4096)>(0xAB) << ((512 - 1) * CHAR_BIT);
+  test_roundtrip<unsigned _BitInt(4096)>(v4096);
+#  endif
+
+  return true;
+}
+
+int main(int, char**) {
+  test();
+  static_assert(test());
+  return 0;
+}
+
+#else
+
+int main(int, char**) { return 0; }
+
+#endif
diff --git a/libcxx/test/std/numerics/bit/byteswap.bitint.verify.cpp b/libcxx/test/std/numerics/bit/byteswap.bitint.verify.cpp
new file mode 100644
index 0000000000000..7ff60a9c94463
--- /dev/null
+++ b/libcxx/test/std/numerics/bit/byteswap.bitint.verify.cpp
@@ -0,0 +1,78 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+// UNSUPPORTED: c++03, c++11, c++14, c++17, c++20
+
+// <bit>
+
+// std::byteswap rejects _BitInt(N) where N is not a multiple of CHAR_BIT.
+//
+// The byte-level builtins (and the generic loop fallback) treat the
+// storage representation as the value, so for a type with padding bits
+// they would shuffle padding into significant positions and produce a
+// value whose semantic meaning is unspecified. The static_assert added
+// in [libc++] Reject byteswap of types with padding bits pins the
+// rejection so the diagnostic does not regress silently into a wrong
+// value.
+
+#include <bit>
+
+#include "test_macros.h"
+
+#if TEST_HAS_EXTENSION(bit_int)
+
+void f_unsigned_13() {
+  unsigned _BitInt(13) v = 0;
+  // expected-error-re@*:* {{{{(static assertion|static_assert)}} failed{{.*}}"std::byteswap requires{{.*}}"}}
+  (void)std::byteswap(v);
+}
+
+void f_signed_13() {
+  signed _BitInt(13) v = 0;
+  // expected-error-re@*:* {{{{(static assertion|static_assert)}} failed{{.*}}"std::byteswap requires{{.*}}"}}
+  (void)std::byteswap(v);
+}
+
+void f_unsigned_17() {
+  unsigned _BitInt(17) v = 0;
+  // expected-error-re@*:* {{{{(static assertion|static_assert)}} failed{{.*}}"std::byteswap requires{{.*}}"}}
+  (void)std::byteswap(v);
+}
+
+void f_signed_33() {
+  signed _BitInt(33) v = 0;
+  // expected-error-re@*:* {{{{(static assertion|static_assert)}} failed{{.*}}"std::byteswap requires{{.*}}"}}
+  (void)std::byteswap(v);
+}
+
+void f_unsigned_65() {
+  unsigned _BitInt(65) v = 0;
+  // expected-error-re@*:* {{{{(static assertion|static_assert)}} failed{{.*}}"std::byteswap requires{{.*}}"}}
+  (void)std::byteswap(v);
+}
+
+#  if __BITINT_MAXWIDTH__ >= 129
+// _BitInt(129) is wider than __int128 and only available where
+// __BITINT_MAXWIDTH__ supports it (x86 / RISC-V 64). The wide-type
+// generic loop also relies on the rejection; cover it here.
+void f_unsigned_129() {
+  unsigned _BitInt(129) v = 0;
+  // expected-error-re@*:* {{{{(static assertion|static_assert)}} failed{{.*}}"std::byteswap requires{{.*}}"}}
+  (void)std::byteswap(v);
+}
+
+void f_signed_255() {
+  signed _BitInt(255) v = 0;
+  // expected-error-re@*:* {{{{(static assertion|static_assert)}} failed{{.*}}"std::byteswap requires{{.*}}"}}
+  (void)std::byteswap(v);
+}
+#  endif
+
+#else
+// expected-no-diagnostics
+#endif

>From 87558ef98a672415caedaf3311390112395adf23 Mon Sep 17 00:00:00 2001
From: Xavier Roche <xavier.roche at algolia.com>
Date: Sun, 10 May 2026 11:53:45 +0200
Subject: [PATCH 4/5] [libc++] Defer byteswap to __builtin_bswapg when
 available

Address philnik's review on PR #196512. The fast path is now a single
call to __builtin_bswapg, which Clang added in November 2025 (PR
llvm/llvm-project#162433). The builtin handles every standard integer
type plus _BitInt(N) where N is a multiple of 16 (or N == 8 for the
size-1 identity), and Clang itself diagnoses non-multiple-of-16
_BitInt with a clear "must be a multiple of 16 bits for byte swapping"
message. That replaces the static_assert and the size-1 / wide-loop
chain we shipped earlier.

The previous implementation stays as a fallback under
__has_builtin(__builtin_bswapg) for compilers that predate Clang 22.
libc++ supports Clang 21+, and __builtin_bswapg is only in Clang 22+,
so the fallback is reachable today on Clang 21. If we drop Clang 21
support (a separate libc++ policy decision) the fallback can go.

Side effects:

- The new <__cstddef/size_t.h> / <climits> / <limits> includes only
  apply on the fallback path now and are guarded behind the same
  __has_builtin check.
- The bit climits transitive-include row added in the previous round
  goes away on Clang 22+; reverted in cxx23.csv and cxx26.csv. The
  fallback path still pulls climits in transitively but Clang 21
  builds aren't measured by the in-tree CI.
- byteswap.bitint.verify.cpp's expected-error-re regex now matches
  either spelling: the static_assert message on the fallback path and
  the "_BitInt type ... must be a multiple of 16 bits" message on the
  bswapg path. Verified locally: clang-20 fallback fires the
  static_assert, in-tree clang-23 bswapg fires Clang's diagnostic.

Verified locally on Clang 23 (bswapg path) and Clang 20 (fallback
path) with the full bit suite (18/18) and the transitive-includes
test (114/114) passing on both. Cross-arch QEMU on 12 targets all
green.

Assisted-by: Claude (Anthropic)
Co-Authored-By: Claude Opus 4.6 <noreply at anthropic.com>
---
 libcxx/include/__bit/byteswap.h               | 29 +++++++++++-----
 .../test/libcxx/transitive_includes/cxx23.csv |  1 -
 .../test/libcxx/transitive_includes/cxx26.csv |  1 -
 .../numerics/bit/byteswap.bitint.verify.cpp   | 34 +++++++++++--------
 4 files changed, 40 insertions(+), 25 deletions(-)

diff --git a/libcxx/include/__bit/byteswap.h b/libcxx/include/__bit/byteswap.h
index 2754265080a34..9f075e9c1d410 100644
--- a/libcxx/include/__bit/byteswap.h
+++ b/libcxx/include/__bit/byteswap.h
@@ -12,10 +12,13 @@
 
 #include <__concepts/arithmetic.h>
 #include <__config>
-#include <__cstddef/size_t.h>
-#include <climits>
 #include <cstdint>
-#include <limits>
+
+#if !__has_builtin(__builtin_bswapg)
+#  include <__cstddef/size_t.h>
+#  include <climits>
+#  include <limits>
+#endif
 
 #if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
 #  pragma GCC system_header
@@ -27,6 +30,15 @@ _LIBCPP_BEGIN_NAMESPACE_STD
 
 template <integral _Tp>
 [[nodiscard]] _LIBCPP_HIDE_FROM_ABI constexpr _Tp byteswap(_Tp __val) noexcept {
+#  if __has_builtin(__builtin_bswapg)
+  // __builtin_bswapg handles all standard integer types as well as
+  // _BitInt(N) with N a multiple of 16 (or N == 8 for identity). Padding-bit
+  // and unsupported-width cases are rejected by Clang with a clear
+  // diagnostic, so no static_assert is needed here.
+  return __builtin_bswapg(__val);
+#  else
+  // Fallback for compilers that do not provide __builtin_bswapg
+  // (added in Clang 22; libc++ supports Clang 21+).
   if constexpr (sizeof(_Tp) == 1) {
     // Identity for size-1 types: no bytes move and no padding gets shuffled
     // into significant positions. bool, char, and _BitInt(N <= CHAR_BIT)
@@ -49,15 +61,15 @@ template <integral _Tp>
       return __builtin_bswap32(__val);
     } else if constexpr (sizeof(_Tp) == 8) {
       return __builtin_bswap64(__val);
-#  if _LIBCPP_HAS_INT128
+#    if _LIBCPP_HAS_INT128
     } else if constexpr (sizeof(_Tp) == 16) {
-#    if __has_builtin(__builtin_bswap128)
+#      if __has_builtin(__builtin_bswap128)
       return __builtin_bswap128(__val);
-#    else
+#      else
       return (static_cast<_Tp>(byteswap(static_cast<uint64_t>(__val))) << 64) |
              static_cast<_Tp>(byteswap(static_cast<uint64_t>(__val >> 64)));
-#    endif // __has_builtin(__builtin_bswap128)
-#  endif   // _LIBCPP_HAS_INT128
+#      endif // __has_builtin(__builtin_bswap128)
+#    endif   // _LIBCPP_HAS_INT128
     } else {
       // Generic byte-reversal for wide integer types (e.g. _BitInt(N) with
       // N > 128). Reads the value 8 bits at a time and writes the bytes
@@ -71,6 +83,7 @@ template <integral _Tp>
       return __result;
     }
   }
+#  endif     // __has_builtin(__builtin_bswapg)
 }
 
 #endif // _LIBCPP_STD_VER >= 23
diff --git a/libcxx/test/libcxx/transitive_includes/cxx23.csv b/libcxx/test/libcxx/transitive_includes/cxx23.csv
index 073f698786117..c5cc61f06678c 100644
--- a/libcxx/test/libcxx/transitive_includes/cxx23.csv
+++ b/libcxx/test/libcxx/transitive_includes/cxx23.csv
@@ -42,7 +42,6 @@ barrier ctime
 barrier limits
 barrier ratio
 barrier version
-bit climits
 bit cstdint
 bit limits
 bit version
diff --git a/libcxx/test/libcxx/transitive_includes/cxx26.csv b/libcxx/test/libcxx/transitive_includes/cxx26.csv
index 5b4ba7918ae96..253cf64703076 100644
--- a/libcxx/test/libcxx/transitive_includes/cxx26.csv
+++ b/libcxx/test/libcxx/transitive_includes/cxx26.csv
@@ -40,7 +40,6 @@ barrier ctime
 barrier limits
 barrier ratio
 barrier version
-bit climits
 bit cstdint
 bit limits
 bit version
diff --git a/libcxx/test/std/numerics/bit/byteswap.bitint.verify.cpp b/libcxx/test/std/numerics/bit/byteswap.bitint.verify.cpp
index 7ff60a9c94463..760d3fa4a995a 100644
--- a/libcxx/test/std/numerics/bit/byteswap.bitint.verify.cpp
+++ b/libcxx/test/std/numerics/bit/byteswap.bitint.verify.cpp
@@ -10,15 +10,19 @@
 
 // <bit>
 
-// std::byteswap rejects _BitInt(N) where N is not a multiple of CHAR_BIT.
+// std::byteswap rejects _BitInt(N) where the bit width is not a multiple
+// of 16. The diagnostic comes from one of two paths depending on the
+// compiler version:
 //
-// The byte-level builtins (and the generic loop fallback) treat the
-// storage representation as the value, so for a type with padding bits
-// they would shuffle padding into significant positions and produce a
-// value whose semantic meaning is unspecified. The static_assert added
-// in [libc++] Reject byteswap of types with padding bits pins the
-// rejection so the diagnostic does not regress silently into a wrong
-// value.
+// - On Clang 22+ where __builtin_bswapg is available, byteswap defers to
+//   the builtin and the rejection diagnostic is Clang's "_BitInt type
+//   ... must be a multiple of 16 bits for byte swapping".
+// - On Clang 21 (the libc++ minimum) the fallback path runs, which uses
+//   a static_assert to reject types whose value bits do not fill the
+//   entire object representation.
+//
+// The regex below matches either spelling so the test is stable across
+// both paths.
 
 #include <bit>
 
@@ -28,31 +32,31 @@
 
 void f_unsigned_13() {
   unsigned _BitInt(13) v = 0;
-  // expected-error-re@*:* {{{{(static assertion|static_assert)}} failed{{.*}}"std::byteswap requires{{.*}}"}}
+  // expected-error-re@*:* {{{{(((static assertion|static_assert) failed.*"std::byteswap requires.*")|(_BitInt type.*must be a multiple of 16 bits))}}}}
   (void)std::byteswap(v);
 }
 
 void f_signed_13() {
   signed _BitInt(13) v = 0;
-  // expected-error-re@*:* {{{{(static assertion|static_assert)}} failed{{.*}}"std::byteswap requires{{.*}}"}}
+  // expected-error-re@*:* {{{{(((static assertion|static_assert) failed.*"std::byteswap requires.*")|(_BitInt type.*must be a multiple of 16 bits))}}}}
   (void)std::byteswap(v);
 }
 
 void f_unsigned_17() {
   unsigned _BitInt(17) v = 0;
-  // expected-error-re@*:* {{{{(static assertion|static_assert)}} failed{{.*}}"std::byteswap requires{{.*}}"}}
+  // expected-error-re@*:* {{{{(((static assertion|static_assert) failed.*"std::byteswap requires.*")|(_BitInt type.*must be a multiple of 16 bits))}}}}
   (void)std::byteswap(v);
 }
 
 void f_signed_33() {
   signed _BitInt(33) v = 0;
-  // expected-error-re@*:* {{{{(static assertion|static_assert)}} failed{{.*}}"std::byteswap requires{{.*}}"}}
+  // expected-error-re@*:* {{{{(((static assertion|static_assert) failed.*"std::byteswap requires.*")|(_BitInt type.*must be a multiple of 16 bits))}}}}
   (void)std::byteswap(v);
 }
 
 void f_unsigned_65() {
   unsigned _BitInt(65) v = 0;
-  // expected-error-re@*:* {{{{(static assertion|static_assert)}} failed{{.*}}"std::byteswap requires{{.*}}"}}
+  // expected-error-re@*:* {{{{(((static assertion|static_assert) failed.*"std::byteswap requires.*")|(_BitInt type.*must be a multiple of 16 bits))}}}}
   (void)std::byteswap(v);
 }
 
@@ -62,13 +66,13 @@ void f_unsigned_65() {
 // generic loop also relies on the rejection; cover it here.
 void f_unsigned_129() {
   unsigned _BitInt(129) v = 0;
-  // expected-error-re@*:* {{{{(static assertion|static_assert)}} failed{{.*}}"std::byteswap requires{{.*}}"}}
+  // expected-error-re@*:* {{{{(((static assertion|static_assert) failed.*"std::byteswap requires.*")|(_BitInt type.*must be a multiple of 16 bits))}}}}
   (void)std::byteswap(v);
 }
 
 void f_signed_255() {
   signed _BitInt(255) v = 0;
-  // expected-error-re@*:* {{{{(static assertion|static_assert)}} failed{{.*}}"std::byteswap requires{{.*}}"}}
+  // expected-error-re@*:* {{{{(((static assertion|static_assert) failed.*"std::byteswap requires.*")|(_BitInt type.*must be a multiple of 16 bits))}}}}
   (void)std::byteswap(v);
 }
 #  endif

>From 02b6ab398e8810ed94dcc474c46a66da67a49f23 Mon Sep 17 00:00:00 2001
From: Xavier Roche <xavier.roche at algolia.com>
Date: Sun, 10 May 2026 12:14:33 +0200
Subject: [PATCH 5/5] [libc++] Pull byteswap fallback's includes outside the
 __has_builtin guard

The __has_builtin(__builtin_bswapg) guard in __bit/byteswap.h had the
fallback path's three extra includes (<__cstddef/size_t.h>, <climits>,
<limits>) inside the guard. That made the transitive include set
differ between Clang 22+ (no climits in transitive) and Clang 21 / GCC
(climits in transitive), so the single transitive_includes CSV could
only match one. AIX-64-bit and android-x86-NDK CI use the fallback
path and surfaced the mismatch.

Move the three includes outside the guard so both paths share the
same transitive set, and add the bit climits row back to cxx23.csv
and cxx26.csv. Cost on the bswapg path is negligible (those headers
are already in <bit>'s transitive set via other paths).

Verified locally: in-tree clang (Clang 23, bswapg path) and g++-13
(targeted fallback compile) both accept the result. transitive_includes
test 114/114 passes, bit suite 18/18 passes.

Assisted-by: Claude (Anthropic)
Co-Authored-By: Claude Opus 4.6 <noreply at anthropic.com>
---
 libcxx/include/__bit/byteswap.h                  | 9 +++------
 libcxx/test/libcxx/transitive_includes/cxx23.csv | 1 +
 libcxx/test/libcxx/transitive_includes/cxx26.csv | 1 +
 3 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/libcxx/include/__bit/byteswap.h b/libcxx/include/__bit/byteswap.h
index 9f075e9c1d410..8d830a5c21637 100644
--- a/libcxx/include/__bit/byteswap.h
+++ b/libcxx/include/__bit/byteswap.h
@@ -12,13 +12,10 @@
 
 #include <__concepts/arithmetic.h>
 #include <__config>
+#include <__cstddef/size_t.h>
+#include <climits>
 #include <cstdint>
-
-#if !__has_builtin(__builtin_bswapg)
-#  include <__cstddef/size_t.h>
-#  include <climits>
-#  include <limits>
-#endif
+#include <limits>
 
 #if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
 #  pragma GCC system_header
diff --git a/libcxx/test/libcxx/transitive_includes/cxx23.csv b/libcxx/test/libcxx/transitive_includes/cxx23.csv
index c5cc61f06678c..073f698786117 100644
--- a/libcxx/test/libcxx/transitive_includes/cxx23.csv
+++ b/libcxx/test/libcxx/transitive_includes/cxx23.csv
@@ -42,6 +42,7 @@ barrier ctime
 barrier limits
 barrier ratio
 barrier version
+bit climits
 bit cstdint
 bit limits
 bit version
diff --git a/libcxx/test/libcxx/transitive_includes/cxx26.csv b/libcxx/test/libcxx/transitive_includes/cxx26.csv
index 253cf64703076..5b4ba7918ae96 100644
--- a/libcxx/test/libcxx/transitive_includes/cxx26.csv
+++ b/libcxx/test/libcxx/transitive_includes/cxx26.csv
@@ -40,6 +40,7 @@ barrier ctime
 barrier limits
 barrier ratio
 barrier version
+bit climits
 bit cstdint
 bit limits
 bit version