[libcxx-commits] [PATCH] D155486: Optimize basic_string's memory usage

Martijn Vels via Phabricator via libcxx-commits libcxx-commits at lists.llvm.org
Fri Aug 4 03:42:13 PDT 2023


mvels added a comment.

In D155486#4551579 <https://reviews.llvm.org/D155486#4551579>, @philnik wrote:

> In D155486#4544305 <https://reviews.llvm.org/D155486#4544305>, @Mordante wrote:
>
>> In D155486#4532334 <https://reviews.llvm.org/D155486#4532334>, @mvels wrote:
>>
>>> BTW: I did not understand the Apple max_size.pass failure, I adjusted the test and they all pass (check-cxx) on my local git repo on x86.
>>
>> On the backdeployment targets the new code uses the old version of the dylib. My suspicion is there is an ABI incompatibility. @philnik do you see an issue in this patch?
>
> This is an ABI break, but I don't think that's the problem. `max_size` isn't instantiated. This look to me like there is an off-by-one error somewhere.
>
> Could we split this into two patches? This is really seems like it has high potential for having subtile bugs.

Let me split the __alignment and growth changes in two separate patches



================
Comment at: libcxx/include/string:2338
+                    : __ms - 1;
+    } else {
+        // The current string is inlined (SSO). In this case we do not want to
----------------
philnik wrote:
> Why do we want to change the growth specifically for the case when we go from short to long?
This is where most of the waste comes from, and we can also argue that the most common use case isn't actually 'Growing' the string, i.e., the doubling is to keep amortized growth on continued growth.

I.e., a common pattern is:
```
std::string s;
s = SomeValue();
```

In the above case assigning a value to s is not really "growing" the string, the fact that the size changes from 0 to n is a consequence of string not having a notion of "unassigned" (which is generally good).

But if you consider the above code, and `SomeValue()` returns a string of say, length 24, then the above algorithm would simply say we "grow from 0 to 24" and applies max(24, __min_cap * 2) = 44 bytes as minimum allocation size, which is wasteful.

Assigning values once to empty strings is a very common occurrence in practice, and there is no good reason to apply amortized growth to cases where a string goes from SSO to non SSO, and not doing so (precise sizing) saves memory,


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155486/new/

https://reviews.llvm.org/D155486



More information about the libcxx-commits mailing list