[libcxx-commits] [PATCH] D102727: Use precise growth rather than amortized growth in std::string::resize/__resize_default_init.

Louis Dionne via Phabricator via libcxx-commits libcxx-commits at lists.llvm.org
Tue Jul 20 12:22:31 PDT 2021


ldionne requested changes to this revision.
ldionne added a comment.
This revision now requires changes to proceed.

In D102727#2881860 <https://reviews.llvm.org/D102727#2881860>, @mvels wrote:

> The change here LGTM, but I do think the O(SQR) use case is a sharp edge. I think we should shield the 'stable' folks from these sharp edges imho. The libc++ spec states nothing about the 'growth' of resize(), which would make me caution on the side of `DTRT` instead of `does the minimum memory thing`. The latter is more a faang thing. I would imagine that a 'stable' user might be surprised with sudden regressions without maybe being too concerned with (temporary) allocations being +/- 20%.

I don't think this is ABI breaking, per my reply to Arthur. However, I want to address a misconception that seems to exist here: `_LIBCPP_ABI_UNSTABLE` is meant for improvements (in correctness, performance or other) that require breaking the ABI. They are the defaults of "tomorrow", when (if) we move to a newer ABI. `_LIBCPP_ABI_UNSTABLE` is not `_LIBCPP_ENABLE_NOT_QUITE_ASSUMED_DECISIONS`, which seems to be how you suggest using it.

In other words, we should figure out whether we want this change, and if so, then the question becomes how to do it. If it turns out not to be an ABI break, we do it by default. If it is an ABI break, then we hide it behind the macro as it has been done in the current version of this patch.

In D102727#2888692 <https://reviews.llvm.org/D102727#2888692>, @ezbr wrote:

> [...]

Thanks for your replies. I'd like to understand: how did you measure the memory improvements you found? Ideally, it would be nice to have something that we can reproduce on a large open-source code base (e.g. Chrome) so that the experiment can be verified. Is that possible? Otherwise, you have to understand I'm blindly basing the decision on "an internal Google experiment", which while I do trust, is not the best from a due diligence perspective. I'm being very conservative because silently introducing quadratic behavior in user code is pretty close to the scariest thing a maintainer can think about -- very bad but also very difficult to detect.

Also, if this came with some sort of Clang warning or clang-tidy check that flagged using `resize()` in a loop (because the standard doesn't guarantee amortized O(1) for that but people might be expecting it), I might ease up a bit. It still wouldn't fix the case where `resize()` is called in a loop lower in the call stack, but it would be good if we could detect obvious misuses automatically. What was the quadratic case you encountered and fixed? Is that something you can share?

I still left some comments on the patch in case this is an ABI break (which I don't think it is), but I feel like we need to figure out whether it's acceptable for libc++ to do this by default (either now or in a future ABI version) before we make progress.



================
Comment at: libcxx/include/__config:116
 #  define _LIBCPP_ABI_ENABLE_SHARED_PTR_TRIVIAL_ABI
+// Use exact string sizes on `resize()` rather than exponential growth.
+#  define _LIBCPP_ABI_EXACT_STRING_RESIZE
----------------
You could expand the comment a bit more -- something like the above.


================
Comment at: libcxx/include/string:3280
+    if (__n > __sz) {
+        if (__n > capacity()) __shrink_or_extend(__recommend(__n));
         append(__n - __sz, __c);
----------------
Quuxplusone wrote:
> mvels wrote:
> > ldionne wrote:
> > > Please put the `__shrink_or_extend` call on its own line (same below).
> > as per my general comment, I'd suggest putting this inside #if defined(_LIBCPP_ABI_UNSTABLE) 
> > But this change has nothing to do with the ABI.
> 
> It has to do with ABI in the sense that one can imagine compiling one TU with this patch, and one TU without this patch, and then linking them together, and having the finished program misbehave; whereas if both TUs were compiled with the old library, they'd have worked OK. In that sense, anything having to do with behavior — and that is not detectable in "API" — is classified as "ABI" by definition.
> 
> Note that this should absolutely not be guarded by `_LIBCPP_ABI_UNSTABLE`; that would make it a "receding target" that would never become the default, not even when libc++ moved to "ABI version 2," because when the stable version is 2, the unstable version will become 3. If we were going to do this as an ABI flag, we'd:
> 
> - Add a new flag `_LIBCPP_ABI_EXACT_VECTOR_RESIZE` circa https://github.com/llvm/llvm-project/blob/2ff5a56/libcxx/include/__config#L95-L96
> - Guard the new lines in `<vector>` under `_LIBCPP_ABI_EXACT_VECTOR_RESIZE`
> - (We don't mention `_LIBCPP_ABI_VERSION>=2` in the `<vector>` header, because (A) that's too easy to thinko e.g. as `_LIBCPP_ABI_VERSION==2`, and (B) it wouldn't be self-documenting. Every libc++ ABI change comes with a greppable name, so we don't end up with a confusing web of "miscellaneous ABI version 2 changes" that are hard to disentangle. This also serves as a good brake on people adding "miscellaneous version-2 changes" willy-nilly.)
> 
> Then users who compile libc++ with ABI version 2 (or newer) will receive the new behavior.
> 
> FWIW, personally I don't care whether this change is done under an ABI flag, or unconditionally, or not-at-all.
> It has to do with ABI in the sense that one can imagine compiling one TU with this patch, and one TU without this patch, and then linking them together, and having the finished program misbehave; whereas if both TUs were compiled with the old library, they'd have worked OK. In that sense, anything having to do with behavior — and that is not detectable in "API" — is classified as "ABI" by definition.

`std::string::resize` is exported from the dylib, so I don't think we have that issue. All programs should be producing an external call to `resize` and should just pick up the new implementation after this change. I also don't think this change is observable except:

1. performance-wise, with the quadratic behavior sharp edge
2. the frequency of iterator invalidation (programs relying on that are wrong but may exist - I wouldn't block this change on only that, though)

Do you see an ABI issue I'm missing?


================
Comment at: libcxx/test/libcxx/strings/basic.string/string.capacity/resize.exact.cpp:13
+
+// ADDITIONAL_COMPILE_FLAGS: -D_LIBCPP_ABI_EXACT_STRING_RESIZE
+
----------------
Enabling tests for the unstable ABI like this is not necessarily going to work. It may happen to work, but it could just as well not work, because the shared library will not have been built with the same ABI settings.

In other words, the ABI in use is not a property that can be changed by users with a mere `#define`, it's a configuration-wide setting that should be decided by vendors when they ship the library. I think what we need to do here instead is:

1. add a CI configuration that builds libc++ with the unstable ABI and runs the tests under that configuration
2. move the existing tests that do something like this to using `// REQUIRES: libcpp-abi-unstable` (we have a couple of tricky ones that define things like `_LIBCPP_ABI_ENABLE_UNIQUE_PTR_TRIVIAL_ABI`)



================
Comment at: libcxx/test/libcxx/strings/basic.string/string.capacity/resize.exact.cpp:1
+//===----------------------------------------------------------------------===//
+//
----------------
This file needs to be called `resize.exact.pass.cpp`.


================
Comment at: libcxx/test/libcxx/strings/basic.string/string.capacity/resize.exact.cpp:28
+  const size_t kSsoSize = S().capacity();
+  const size_t kAlign = 16;
+  for (int i = kSsoSize + 1; i < 1 << 10; ++i) {
----------------
Quuxplusone wrote:
> Quuxplusone wrote:
> > Should this be `alignof(std::max_align_t)`? Or what's the significance of `16` — how was it computed?
> > Regarding `kAlign`, this is [`basic_string::__alignment`, which is private.](https://github.com/llvm/llvm-project/blob/2d0f1fa/libcxx/include/string#L1577) Do you think it's okay to include this implementation detail here?
> 
> Well, //I// have no better suggestion; defer to @ldionne here.
How would you compute that? If it can be done non-intrusively, then I would do that, even if it adds a bit of complexity to the test.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D102727/new/

https://reviews.llvm.org/D102727



More information about the libcxx-commits mailing list