[libcxx-commits] [PATCH] D97705: [RFC][libc++] Improve std::to_chars for base != 10.
Mark de Wever via Phabricator via libcxx-commits
libcxx-commits at lists.llvm.org
Mon Mar 1 10:13:03 PST 2021
Mordante created this revision.
Mordante added reviewers: EricWF, ldionne, mclow.lists, miscco, curdeius.
Mordante requested review of this revision.
Herald added a project: libc++.
Herald added a subscriber: libcxx-commits.
Herald added a reviewer: libc++.
While working on D70631 <https://reviews.llvm.org/D70631> Microsoft's unit tests discovered an issue. This
is an RFC proposal to look at the direction we want to take to solve the
issue. The code is just a proof-of-concept and not intended to be the
I noticed our `to_chars` implementation for bases != 10 uses the range
[first, last) as temporary buffer. This violates the contract for
`to_chars_result to_chars(char* first, char* last, see below value, int base = 10);`
"If the member ec of the return value is such that the value is equal to
the value of a value-initialized errc, the conversion was successful and
the member ptr is the one-past-the-end pointer of the characters
Our implementation modifies the range `[member ptr, last)`, which causes
Microsoft's test to fail. Their test verifies the buffer
`[member ptr, last)` is unchanged. (The test is only done when the
conversion is successful.)
I've some solutions to fix this issue, in basics the solutions all do
- Determine the required output length.
- If the output is too small return an error.
- Else write the data and return success.
I've made 4 benchmarks:
- `to_chars` the current implementation,
- `to_chars_inplace` a fix using the base as function argument,
- `to_chars_inplace_multiplexed` a fix using the base as template argument,
- `to_chars_inplace_multiplexed_for_format` a combination of the two, bases 2, 8, 16 using template arguments the other function arguments.
The implementation of base 10 hasn't been modified.
The sizes of the 4 benchmarks on an amd64 are:
text data bss dec hex filename
226129 2620 4936 233685 390d5 to_chars.libcxx.out
226257 2620 4936 233813 39155 to_chars_inplace.libcxx.out
226641 2620 4936 234197 392d5 to_chars_inplace_multiplexed_for_format.libcxx.out
232289 2620 4936 239845 3a8e5 to_chars_inplace_multiplexed.libcxx.out
The results of the benchmarks are available in this patch. Based on
these results it's clear the multiplexed version has a better
performance at the cost of about 6KB of additional code size. At 0.5KB
we can at least make 2, 8, 16 faster, these bases are common and will be
available in `std::format`.
The inplace version is slower, but faster if the buffer is too small. I
expect that to be a rare case so I don't think these gains are too
important. I didn't test our current algorithm with a stack buffer, but
would expect similar performance as our current algorithm.
1. Use multiplexed for all cases.
2. Use multiplexed for 2, 8, 16 and the others should use the original algorithm, but using a temporary stack buffer.
3. Investigate the effects of inlining the multiplexed versions for 2, 8, and 16 and the others as external template in charconv.cpp. (Leaving base10 unchanged.)
I would propose to use option 3.
When proceeding some additional improvements should be investigated:
- Determining the required number of digits for the values with a power of 2 can probably done faster with bit operations.
- The base 10 version is slower, I expect this to be fixed by using an if statement instead of a case statement.
- There's no code size optimization to share 8, 16, and 32 bit code like it has been done for base 10.
- The base 2 and base 8 might benefit by processing 16 bits per iteration.
rG LLVM Github Monorepo
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 41839 bytes
Desc: not available
More information about the libcxx-commits