[llvm-branch-commits] [libcxx] [libc++][format][3/3] Improves formatting performance. (PR #108990)

Louis Dionne via llvm-branch-commits llvm-branch-commits at lists.llvm.org
Tue Sep 17 09:05:12 PDT 2024


================
@@ -53,24 +56,150 @@ _LIBCPP_BEGIN_NAMESPACE_STD
 
 namespace __format {
 
+// A helper to limit the total size of code units written.
+class _LIBCPP_HIDE_FROM_ABI __max_output_size {
+public:
+  [[nodiscard]] _LIBCPP_HIDE_FROM_ABI explicit __max_output_size(size_t __max_size) : __max_size_{__max_size} {}
+
+  // This function adjusts the size of a (bulk) write operations. It ensures the
+  // number of code units written by a __output_buffer never exceeds
+  // __max_size_ code units.
+  [[nodiscard]] _LIBCPP_HIDE_FROM_ABI size_t __write_request(size_t __code_units) {
+    size_t __result =
+        __code_units_written_ < __max_size_ ? std::min(__code_units, __max_size_ - __code_units_written_) : 0;
+    __code_units_written_ += __code_units;
+    return __result;
+  }
+
+  [[nodiscard]] _LIBCPP_HIDE_FROM_ABI size_t __code_units_written() const noexcept { return __code_units_written_; }
+
+private:
+  size_t __max_size_;
+  // The code units that would have been written if there was no limit.
+  // format_to_n returns this value.
+  size_t __code_units_written_{0};
+};
+
 /// A "buffer" that handles writing to the proper iterator.
 ///
 /// This helper is used together with the @ref back_insert_iterator to offer
 /// type-erasure for the formatting functions. This reduces the number to
 /// template instantiations.
+///
+/// The design is the following:
+/// - There is an external object that connects the buffer to the output.
+/// - This buffer object:
+///   - inherits publicly from this class.
+///   - has a static or dynamic buffer.
+///   - has a static member function to make space in its buffer write
+///     operations. This can be done by increasing the size of the internal
+///     buffer or by writing the contents of the buffer to the output iterator.
+///
+///     This member function is a constructor argument, so its name is not
+///     fixed. The code uses the name __prepare_write.
+/// - The number of output code units can be limited by a __max_output_size
+///   object. This is used in format_to_n This object:
+///   - Contains the maximum number of code units to be written.
+///   - Contains the number of code units that are requested to be written.
+///     This number is returned to the user of format_to_n.
+///   - The write functions call the object's __request_write member function.
+///     This function:
+///     - Updates the number of code units that are requested to be written.
+///     - Returns the number of code units that can be written without
+///       exceeding the maximum number of code units to be written.
+///
+/// Documentation for the buffer usage members:
+/// - __ptr_
+///   The start of the buffer.
+/// - __capacity_
+///   The number of code units that can be written. This means
+///   [__ptr_, __ptr_ + __capacity_) is a valid range to write to.
+/// - __size_
+///   The number of code units written in the buffer. The next code unit will
+///   be written at __ptr_ + __size_. This __size_ may NOT contain the total
+///   number of code units written by the __output_buffer. Whether or not it
+///   does depends on the sub-class used. Typically the total number of code
+///   units written is not interesting. It is interesting for format_to_n which
+///   has its own way to track this number.
+///
+/// Documentation for the buffer changes function:
+/// The subclasses have a function with the following signature:
+///
+///   static void __prepare_write(
+///     __output_buffer<_CharT>& __buffer, size_t __code_units);
+///
+/// This function is called when a write function writes more code units than
+/// the buffer's available space. When an __max_output_size object is provided
+/// the number of code units is the number of code units returned from
+/// __max_output_size::__request_write function.
+///
+/// - The __buffer contains *this. Since the class containing this function
+///   inherits from __output_buffer it's safe to cast it to the subclass being
+///   used.
+/// - The __code_units is the number of code units the caller will write + 1.
+///   - This value does not take the avaiable space of the buffer into account.
+///   - The push_back function is more efficient when writing before resizing,
+///     this means the buffer should always have room for one code unit. Hence
+///     the + 1 is the size.
+/// - When the function returns there is room for at least one code unit. There
+///   is no requirement there is room for __code_units code units:
+///   - The class has some "bulk" operations. For example, __copy which copies
+///     the contents of a basic_string_view to the output. If the sub-class has
+///     a fixed size buffer the size of the basic_string_view may be larger
+///     than the buffer. In that case it's impossible to honor the requested
+///     size.
+///   - The at least one code unit makes sure the entire output can be written.
+///     (Obviously making room one code unit at a time is slow and
+///     it's recommended to return a larger available space.)
----------------
ldionne wrote:

```suggestion

```

Is this part of the comment necessary? I don't understand what information it adds. LMK if I misunderstood something.

https://github.com/llvm/llvm-project/pull/108990


More information about the llvm-branch-commits mailing list