[libcxx-commits] [libcxx] [libc++][string] Fix unnecessary padding for basic_string's __short (PR #135973)

Gábor Spaits via libcxx-commits libcxx-commits at lists.llvm.org
Thu Apr 17 04:57:58 PDT 2025


https://github.com/spaits updated https://github.com/llvm/llvm-project/pull/135973

>From 7e348e5a9bb86c97068e486c69a8ebd927a10ed0 Mon Sep 17 00:00:00 2001
From: Gabor Spaits <gaborspaits1 at gmail.com>
Date: Wed, 16 Apr 2025 16:07:20 +0200
Subject: [PATCH] Fix unnecessary padding for basic_string's __short

Originally, padding in the struct `__short` was done with arrays, but when no padding was required, we got undefined behavior due to a zero-length array.
C++17 11.3.4 Arrays p1:
```
If the <<length of array>> is present, it shall be a converted constant
expression of type std::size_t and its value shall be greater than zero.
```
A fix was proposed in commit d95597dc06c510ad7fbf00a43583c54d38f79aa7, which worked around the undefined behavior by wrapping the padding array in a struct if there was padding required and omitting the array altogether, with the help of a specialization, when the padding is zero.

This change doesn't just work around the problem but changes the semantics of the code, introducing an issue.

Contrary to C, in C++ empty structs and classes must have a size greater than zero.
C++ 17 12 Classes p4:
```
Complete objects and member subobjects of class type shall have nonzero size.
```
So whenever there is no padding required we insert an empty struct-sized padding, which is 1 byte on X86_64 (but we won't feel it there, since on that platform padding is required if I am correct).

I would like to propose another standard compliant solution using unnamed 0 width bitfields.
12.2.4 Bit-fields p2:
```
As a special case, an unnamed bit-field with a width of zero
specifies alignment of the next bit-field at an allocation unit boundary. Only when declaring an unnamed
bit-field may the value of the constant-expression be equal to zero.
```
---
 libcxx/include/string | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/libcxx/include/string b/libcxx/include/string
index e7e541e31432d..8f168572d6be6 100644
--- a/libcxx/include/string
+++ b/libcxx/include/string
@@ -715,14 +715,6 @@ struct __can_be_converted_to_string_view
 struct __uninitialized_size_tag {};
 struct __init_with_sentinel_tag {};
 
-template <size_t _PaddingSize>
-struct __padding {
-  char __padding_[_PaddingSize];
-};
-
-template <>
-struct __padding<0> {};
-
 template <class _CharT, class _Traits, class _Allocator>
 class basic_string {
 public:
@@ -827,9 +819,9 @@ private:
 
   struct __short {
     value_type __data_[__min_cap];
-    _LIBCPP_NO_UNIQUE_ADDRESS __padding<sizeof(value_type) - 1> __padding_;
     unsigned char __size_    : 7;
     unsigned char __is_long_ : 1;
+    value_type : 0;
   };
 
   // The __endian_factor is required because the field we use to store the size
@@ -879,7 +871,7 @@ private:
       unsigned char __is_long_ : 1;
       unsigned char __size_    : 7;
     };
-    _LIBCPP_NO_UNIQUE_ADDRESS __padding<sizeof(value_type) - 1> __padding_;
+    value_type : 0;
     value_type __data_[__min_cap];
   };
 



More information about the libcxx-commits mailing list