[libcxx-commits] [PATCH] D112361: [libc++][format] Buffer changes proof-of-concept

Mark de Wever via Phabricator via libcxx-commits libcxx-commits at lists.llvm.org
Mon Nov 22 09:09:07 PST 2021


Mordante planned changes to this revision.
Mordante marked an inline comment as done.
Mordante added a comment.

Please note that the changes in this patch have been applied to the other patches in the stack. So this patch can probably be abandoned.
(For now I keep it open, but remove it from the review queue.)



================
Comment at: libcxx/include/__format/buffer.h:79-85
+  _LIBCPP_HIDE_FROM_ABI void put(_CharT __c) {
+    *__buffer_it_++ = __c;
+    // Profiling showed flushing after adding is more efficient than flushing
+    // when entering the function.
+    if (__buffer_it_ == __buffer_.end())
+      flush();
+  }
----------------
vitaut wrote:
> Mordante wrote:
> > Quuxplusone wrote:
> > > >>! @vitaut wrote on D110495:
> > > > I think D112361 is a step in the right direction. You can do even better and make it represent a contiguous (not necessarily fixed-size) buffer with a single type-erased reallocation/flush function. Then it can be used to write to contiguous containers like vector and string which is a common case directly avoiding copies. This is basically what {fmt} does.
> > > 
> > > Charlie Barto also showed MSVC STL doing that approach, in his CppCon talk on Thursday. https://cppcon.digital-medium.co.uk/thursday-recordings/
> > > IIRC, basically they have `_Size, _Capacity, virtual void _Grow()`; and whenever `_Size == _Capacity` they call `_Grow`. I guess I remain fuzzy on the details of what `_Grow` would actually do for a `std::string` (does it resize? does it merely reserve?).
> > That looks a lot like what fmt does. I tested with a similar approach, but the virtual call seems to have a measurable impact over the `__fun_(__buffer_.begin(), __buffer_it_, __obj_);` method.
> The virtual call can be replaced with an indirect call (similar to `__flush_` in this diff) and it should only be called rarely (normally when you need to do a reallocation). But to see the benefits you really need to make use of writing to the buffer directly. It doesn't make much sense to test with a `push_back` API which only writes individual characters. I guess the latter is the reason why you even see virtual calls at all. The current approach while an improvement has an extra level of buffering and likely be slower in common scenarios compared to direct output. I think you can even implement the flush method using the grow method + a fixed intermediate buffer so the latter is strictly more powerful and can address cases where an extra buffering helps reducing reallocations.
It seems this comment has moved and not really attached to where the comment belongs. I've there changes:
- A direct buffer (`char*`, `std::vector<char>::begin()`, `std::string::begin()`) writes directly to the underlying character pointer. The flush there is used to keep track of the number of writes. A `std::string::iterator` is a `std::wrap_iter`. To return the proper iterator this calculation is done at the end. The direct buffer should only flush once since its capacity is `size_t(-1)`. (For `format_to_n` the capacity of the buffer is limited to `n`.)
- An indirect buffer (all other iterators) write to an internal buffer and are copied to the output during the flush. When it's a `std::back_insert_iterator` for an container with an `insert` member it uses `insert` else it just copies it.

Making the internal buffer a variable size might help. However during testing using a larger internal buffer didn't have a real affect on the benchmarks. This might change when I add insert operations for `std::string_view` in the buffer. I like to investigate this once I start working on that part.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112361/new/

https://reviews.llvm.org/D112361



More information about the libcxx-commits mailing list