[libcxx-commits] [PATCH] D114001: [libc++][format] Adds formatter floating-point.

Mon Jan 17 11:47:11 PST 2022

Mordante marked 16 inline comments as done.
Mordante added a comment.

Thanks for the review comments. I've answered the first batch. The others need a bit more investigation.

================
Comment at: libcxx/include/__format/formatter_floating_point.h:59
+  to_chars_result __r = _VSTD::to_chars(__first, __last, __value);
+  _LIBCPP_ASSERT(__r.ec == errc(0), "Internal buffer too small");
+  return __r.ptr;
----------------
vitaut wrote:
> Can `to_chars` fail for any reason other than small buffer size?
No. According to the Standard the only error it can return is `errc::value_too_large`. http://eel.is/c++draft/charconv#to.chars-1

================
Comment at: libcxx/include/__format/formatter_floating_point.h:87
+// the minimal subnormal hexadecimal output's power of 10. Every division of a
+// fraction's binary 1 by 2, requires one additional decimal digit.
+//
----------------
vitaut wrote:
> I'd add that many of these fractional digits are zeros and don't have to be generated as a potential future optimization. 
I've added a todo and a link to the article you shared earlier.

================
Comment at: libcxx/include/__format/formatter_floating_point.h:97
+// __max_precision
+// exponent                1 code unit
+// sign                    1 code unit
----------------
vitaut wrote:
> This is not an exponent but a character that introduces it. Maybe replace with `e/E/p/P`?
I've changed it to exponent character, thinks that's clearer than only the letters.

================
Comment at: libcxx/include/__format/formatter_floating_point.h:161
+    //     The input is always a char buffer, so every char in the buffer needs
+    //     to be converted from '0' to L'0'.
+    if (__precision_ > _Traits::__max_precision) {
----------------
vitaut wrote:
> nit: "from '0' to L'0'" -> "from narrow to wide" since the buffer will contain characters other than '0'.
Actually this is only about zeros. This explains why the digits that are always zero at the end aren't stored in the buffer. Still I think changing the wording to avoid the value is a good suggestion.

================
Comment at: libcxx/include/__format/formatter_floating_point.h:191
+  int __precision_;
+  int __trailing_zero_{0};
+  size_t __size_;
----------------
vitaut wrote:
> `__trailing_zero_ `-> `__num_trailing_zeros_` for consistency with other places and the accessor.
Seems I missed updating them after changing the accessor :-(

================
Comment at: libcxx/include/__format/formatter_floating_point.h:194
+  char* __begin_;
+  allocator<char> __alloc_;
+  char __buffer_[_Traits::__stack_buffer_size];
----------------
vitaut wrote:
> Do you need to store the allocator? Is it stateful?
Fair point, it's the default so it should be stateless.

================
Comment at: libcxx/include/__format/formatter_floating_point.h:651
+  /// to the number of possible additional work needed to turn this number to
+  /// the proper output the code was litter with tests for upper cases and
+  /// searches for radix points and exponents.
----------------
vitaut wrote:
> litter -> littered (although I'm not sure if this historical context about previous implementation is useful at all)
I like to keep the historic comment, it explains why the current approach was chosen.

================
Comment at: libcxx/include/__format/formatter_floating_point.h:659-660
+  // TODO FMT _Fp should just be _Tp when to_chars has proper long double support.
+  _LIBCPP_HIDE_FROM_ABI __float_result __format_buffer(__float_buffer<_Fp>& __buffer, _Tp __value, bool __negative,
+                                                       bool __has_precision) {
+    char* __first = __insert_sign(__buffer.begin(), __negative, this->__sign);
----------------
vitaut wrote:
> This and almost all other function template here don't need to be defined in a header because they are only instantiated for a few known types (float/double/long double). Moving them out of the header would be beneficial for build times (esp. since you can also move includes).
This and other code could be moved to the dylib. There's one big disadvantage of the dylib. As soon as the function is exported in the dylib its ABI is frozen. As long as the code is still under development I don't want to freeze the ABI. Especially this function will change once we have proper long double support. Moving code to the dylib is on my todo list, but I want to wait until the code has matured a bit further.

For example the `this->__sign` on the next line will change when the formatter no longer inherits from a parser. That will be impossible when I move it to the dylib now.

================
Comment at: libcxx/include/__format/parser_std_format_spec.h:489-490
+  __flags.__zero_padding &= __flags.__alignment == _Flags::_Alignment::__default;
+  if (!__flags.__zero_padding && __flags.__alignment == _Flags::_Alignment::__default)
+    __flags.__alignment = _Flags::_Alignment::__right;
+}
----------------
vitaut wrote:
> Why not set alignment unconditionally? Numeric alignment is just a variation of right alignment and you already have `__zero_padding` to check for it.
There are some catches with with alignment and zero padding. When the user selects zero padding and sets the alignment to zero padding is ignored. When parsing it's not certain what the default alignment will be, for example with a boolean it depends on the display type.

I agree this code isn't the cleanest and I'm not happy with it. I still want to look at another solution. I even wonder whether I want to implement the old '=' alignment internally. But before changing it I also want to look at how you implemented it.

================
Comment at: libcxx/test/libcxx/utilities/format/format.string/format.string.std/std_format_spec_floating_point.pass.cpp:186
+  test({.width = 2'147'483'647, .width_as_arg = false}, 10, CSTR("2147483647}"));
+  test_exception<Parser<CharT>>("The numeric value of the format-spec is too large", CSTR("2147483648"));
+  test_exception<Parser<CharT>>("The numeric value of the format-spec is too large", CSTR("5000000000"));
----------------
vitaut wrote:
> The error message could be improved. First why "the" format-spec and second format-spec doesn't have a numeric value. Something along the lines of "Number (is) of out of range (in a format string)" would be better. Format string can probably be omitted too because it's redundant in `format_error`.
> 
> BTW is the error format consistent with the rest of libc++, particularly is the first word normally capitalized and do they use standardese terms?
For now I'll keep this message, it's also used for all existing parsers where the width exceeds the maximum supported value.

================
Comment at: libcxx/test/libcxx/utilities/format/format.string/format.string.std/std_format_spec_floating_point.pass.cpp:190
+
+  test_exception<Parser<CharT>>("End of input while parsing format-spec arg-id", CSTR("{"));
+  test_exception<Parser<CharT>>("Invalid arg-id", CSTR("{0"));
----------------
vitaut wrote:
> This error message can be improved too because the fact that you are parsing arg-id is an implementation detail that doesn't help user. A more helpful message would say something about unmatched `{`, for example:
> 
> ```
> >>> "{".format(42)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> ValueError: Single '{' encountered in format string
> ```
> 
> I'd recommend making a pass over error messages in a separate diff and making them a bit more user-friendly, possibly looking how similar errors are reported in python which does a decent job.
I agree. I already had a talk with some of the other libc++ maintainers on Discord regarding this subject. I like your suggestion to look at python. But indeed that will be a separate diff.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114001/new/

https://reviews.llvm.org/D114001