[llvm-branch-commits] [libcxx] [libc++][format][3/3] Improves formatting performance. (PR #108990)
Louis Dionne via llvm-branch-commits
llvm-branch-commits at lists.llvm.org
Tue Sep 17 09:05:14 PDT 2024
================
@@ -53,24 +56,150 @@ _LIBCPP_BEGIN_NAMESPACE_STD
namespace __format {
+// A helper to limit the total size of code units written.
+class _LIBCPP_HIDE_FROM_ABI __max_output_size {
+public:
+ [[nodiscard]] _LIBCPP_HIDE_FROM_ABI explicit __max_output_size(size_t __max_size) : __max_size_{__max_size} {}
+
+ // This function adjusts the size of a (bulk) write operations. It ensures the
+ // number of code units written by a __output_buffer never exceeds
+ // __max_size_ code units.
+ [[nodiscard]] _LIBCPP_HIDE_FROM_ABI size_t __write_request(size_t __code_units) {
+ size_t __result =
+ __code_units_written_ < __max_size_ ? std::min(__code_units, __max_size_ - __code_units_written_) : 0;
+ __code_units_written_ += __code_units;
+ return __result;
+ }
+
+ [[nodiscard]] _LIBCPP_HIDE_FROM_ABI size_t __code_units_written() const noexcept { return __code_units_written_; }
+
+private:
+ size_t __max_size_;
+ // The code units that would have been written if there was no limit.
+ // format_to_n returns this value.
+ size_t __code_units_written_{0};
+};
+
/// A "buffer" that handles writing to the proper iterator.
///
/// This helper is used together with the @ref back_insert_iterator to offer
/// type-erasure for the formatting functions. This reduces the number to
/// template instantiations.
+///
+/// The design is the following:
+/// - There is an external object that connects the buffer to the output.
+/// - This buffer object:
+/// - inherits publicly from this class.
+/// - has a static or dynamic buffer.
+/// - has a static member function to make space in its buffer write
+/// operations. This can be done by increasing the size of the internal
+/// buffer or by writing the contents of the buffer to the output iterator.
+///
+/// This member function is a constructor argument, so its name is not
+/// fixed. The code uses the name __prepare_write.
+/// - The number of output code units can be limited by a __max_output_size
+/// object. This is used in format_to_n This object:
+/// - Contains the maximum number of code units to be written.
+/// - Contains the number of code units that are requested to be written.
+/// This number is returned to the user of format_to_n.
+/// - The write functions call the object's __request_write member function.
+/// This function:
+/// - Updates the number of code units that are requested to be written.
+/// - Returns the number of code units that can be written without
+/// exceeding the maximum number of code units to be written.
+///
+/// Documentation for the buffer usage members:
+/// - __ptr_
+/// The start of the buffer.
+/// - __capacity_
+/// The number of code units that can be written. This means
+/// [__ptr_, __ptr_ + __capacity_) is a valid range to write to.
+/// - __size_
+/// The number of code units written in the buffer. The next code unit will
+/// be written at __ptr_ + __size_. This __size_ may NOT contain the total
+/// number of code units written by the __output_buffer. Whether or not it
+/// does depends on the sub-class used. Typically the total number of code
+/// units written is not interesting. It is interesting for format_to_n which
+/// has its own way to track this number.
+///
+/// Documentation for the buffer changes function:
+/// The subclasses have a function with the following signature:
+///
+/// static void __prepare_write(
+/// __output_buffer<_CharT>& __buffer, size_t __code_units);
+///
+/// This function is called when a write function writes more code units than
+/// the buffer's available space. When an __max_output_size object is provided
+/// the number of code units is the number of code units returned from
+/// __max_output_size::__request_write function.
+///
+/// - The __buffer contains *this. Since the class containing this function
+/// inherits from __output_buffer it's safe to cast it to the subclass being
+/// used.
+/// - The __code_units is the number of code units the caller will write + 1.
+/// - This value does not take the avaiable space of the buffer into account.
+/// - The push_back function is more efficient when writing before resizing,
+/// this means the buffer should always have room for one code unit. Hence
+/// the + 1 is the size.
+/// - When the function returns there is room for at least one code unit. There
+/// is no requirement there is room for __code_units code units:
+/// - The class has some "bulk" operations. For example, __copy which copies
+/// the contents of a basic_string_view to the output. If the sub-class has
+/// a fixed size buffer the size of the basic_string_view may be larger
+/// than the buffer. In that case it's impossible to honor the requested
+/// size.
+/// - The at least one code unit makes sure the entire output can be written.
+/// (Obviously making room one code unit at a time is slow and
+/// it's recommended to return a larger available space.)
+/// - When the buffer has room for at least one code unit the function may be
+/// a no-op.
+/// - When the function makes space for more code units it uses one for these
+/// functions to signal the change:
+/// - __buffer_flushed()
+/// - This function is typically used for a fixed sized buffer.
+/// - The current contents of [__ptr_, __ptr_ + __size_) have been
+/// processed.
+/// - __ptr_ remains unchanged.
+/// - __capacity_ remains unchanged.
+/// - __size_ will be set to 0.
+/// - __buffer_moved(_CharT* __ptr, size_t __capacity)
+/// - This function is typically used for a dynamic sized buffer. There the
+/// location of the buffer changes due to reallocations.
+/// - __ptr_ will be set to __ptr. (This value may be the old value of
+/// __ptr_).
+/// - __capacity_ will be set to __capacity. (This value may be the old
+/// value of __capacity_).
----------------
ldionne wrote:
```suggestion
/// value of __capacity_).
```
https://github.com/llvm/llvm-project/pull/108990
More information about the llvm-branch-commits
mailing list