[libcxx-commits] [PATCH] D114001: [libc++][format] Adds formatter floating-point.

Sun Jan 16 09:00:27 PST 2022

vitaut added a comment.

Overall looks good, just a bunch of minor(ish) comments inline.

================
Comment at: libcxx/include/__format/formatter_floating_point.h:59
+  to_chars_result __r = _VSTD::to_chars(__first, __last, __value);
+  _LIBCPP_ASSERT(__r.ec == errc(0), "Internal buffer too small");
+  return __r.ptr;
----------------
Can `to_chars` fail for any reason other than small buffer size?

================
Comment at: libcxx/include/__format/formatter_floating_point.h:87
+// the minimal subnormal hexadecimal output's power of 10. Every division of a
+// fraction's binary 1 by 2, requires one additional decimal digit.
+//
----------------
I'd add that many of these fractional digits are zeros and don't have to be generated as a potential future optimization. 

================
Comment at: libcxx/include/__format/formatter_floating_point.h:96
+// radix point             1 code unit
+// __max_precision
+// exponent                1 code unit
----------------
I suggest renaming `__max_precision` to something like `__max_fractional` for consistency with `__max_integral` and since precision can refer to the number of integral and fractional digits (depending on the presentation format).

================
Comment at: libcxx/include/__format/formatter_floating_point.h:97
+// __max_precision
+// exponent                1 code unit
+// sign                    1 code unit
----------------
This is not an exponent but a character that introduces it. Maybe replace with `e/E/p/P`?

================
Comment at: libcxx/include/__format/formatter_floating_point.h:99
+// sign                    1 code unit
+// __max_precision_value
+// -----------------------------------
----------------
This looks like the number of digits in the minimum (subnormal) exponent so I suggest renaming accordingly.

Moreover, I don't think you can produce a number that contains both maximum fractional digits and an exponent because for the former you have to use the fixed format which doesn't have an exponent, e.g. (https://godbolt.org/z/jsrPqsEEb)

```
  fmt::print("{:.1074f}", 4.940656458412e-324);
```
produces:

```
0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000004940656458412465441765687928682213723650598026143247644255856825006755072702087518652998363616359923797965646954457177309266567103559397963987747960107818781263007131903114045278458171678489821036887186360569987307230500063874091535649843873124733972731696151400317153853980741262385655911710266585566867681870395603106249319452715914924553293054565444011274801297099995419319894090804165633245247571478690147267801593552386115501348035264934720193790268107107491703332226844753335720832431936092382893458368060106011506169809753078342277318329247904982524730776375927247874656084778203734469699533647017972677717585125660551199131504891101451037862738167250955837389733598993664809941164205702637090279242767544565229087538682506419718265533447265625
```
(note that there is no exponent)

And if you use some other format, you'll get an exponent but no leading zeros so buffer requirements will be lower.

Therefore you can remove exponent, sign and __max_precision_value from these computations and simplify the trait.

================
Comment at: libcxx/include/__format/formatter_floating_point.h:99
+// sign                    1 code unit
+// __max_precision_value
+// -----------------------------------
----------------
vitaut wrote:
> This looks like the number of digits in the minimum (subnormal) exponent so I suggest renaming accordingly.
> 
> Moreover, I don't think you can produce a number that contains both maximum fractional digits and an exponent because for the former you have to use the fixed format which doesn't have an exponent, e.g. (https://godbolt.org/z/jsrPqsEEb)
> 
> ```
>   fmt::print("{:.1074f}", 4.940656458412e-324);
> ```
> produces:
> 
> ```
> 0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000004940656458412465441765687928682213723650598026143247644255856825006755072702087518652998363616359923797965646954457177309266567103559397963987747960107818781263007131903114045278458171678489821036887186360569987307230500063874091535649843873124733972731696151400317153853980741262385655911710266585566867681870395603106249319452715914924553293054565444011274801297099995419319894090804165633245247571478690147267801593552386115501348035264934720193790268107107491703332226844753335720832431936092382893458368060106011506169809753078342277318329247904982524730776375927247874656084778203734469699533647017972677717585125660551199131504891101451037862738167250955837389733598993664809941164205702637090279242767544565229087538682506419718265533447265625
> ```
> (note that there is no exponent)
> 
> And if you use some other format, you'll get an exponent but no leading zeros so buffer requirements will be lower.
> 
> Therefore you can remove exponent, sign and __max_precision_value from these computations and simplify the trait.
Also you might want to add formatting of the minimum subnormals to tests because it's an interesting corner case and may reveal buffer issues if the test suite is run with sanitizers.

================
Comment at: libcxx/include/__format/formatter_floating_point.h:161
+    //     The input is always a char buffer, so every char in the buffer needs
+    //     to be converted from '0' to L'0'.
+    if (__precision_ > _Traits::__max_precision) {
----------------
nit: "from '0' to L'0'" -> "from narrow to wide" since the buffer will contain characters other than '0'.

================
Comment at: libcxx/include/__format/formatter_floating_point.h:191
+  int __precision_;
+  int __trailing_zero_{0};
+  size_t __size_;
----------------
`__trailing_zero_ `-> `__num_trailing_zeros_` for consistency with other places and the accessor.

================
Comment at: libcxx/include/__format/formatter_floating_point.h:194
+  char* __begin_;
+  allocator<char> __alloc_;
+  char __buffer_[_Traits::__stack_buffer_size];
----------------
Do you need to store the allocator? Is it stateful?

================
Comment at: libcxx/include/__format/formatter_floating_point.h:300
+
+    // When the exponent isn't found, adjusted it to the proper sentinel.
+    if (__result.__exponent == __last)
----------------
Isn't exponent always produced in hex format?

================
Comment at: libcxx/include/__format/formatter_floating_point.h:392
+
+  // The chars_format::general doesn't add trailing zero's. A printf("%#g") option is not available.
+  __buffer.__remove_trailing_zero();
----------------
zero's -> zeros

Also I don't understand this comment. What does it have to do with `printf` and what do you mean by not available? Do you mean that `#` is not yet implemented here?

================
Comment at: libcxx/include/__format/formatter_floating_point.h:393
+  // The chars_format::general doesn't add trailing zero's. A printf("%#g") option is not available.
+  __buffer.__remove_trailing_zero();
+
----------------
nit: `__remove_trailing_zero` -> `__remove_trailing_zeros` since there can be more than one

================
Comment at: libcxx/include/__format/formatter_floating_point.h:541
+    // - zero-padding may insert additional '0' characters.
+    // Therefore the value is processed as a positive value.
+    // The function @ref __insert_sign will insert a '-' when the value was
----------------
positive -> nonnegative

================
Comment at: libcxx/include/__format/formatter_floating_point.h:552
+    __float_buffer<_Fp> __buffer(__has_precision ? int(this->__precision) : -1);
+    __float_result __result = __format_buffer(__buffer, __value, __negative, __has_precision);
+
----------------
You might want to outline the rest of this function because it doesn't depend on `_Tp`.

================
Comment at: libcxx/include/__format/formatter_floating_point.h:651
+  /// to the number of possible additional work needed to turn this number to
+  /// the proper output the code was litter with tests for upper cases and
+  /// searches for radix points and exponents.
----------------
litter -> littered (although I'm not sure if this historical context about previous implementation is useful at all)

================
Comment at: libcxx/include/__format/formatter_floating_point.h:659-660
+  // TODO FMT _Fp should just be _Tp when to_chars has proper long double support.
+  _LIBCPP_HIDE_FROM_ABI __float_result __format_buffer(__float_buffer<_Fp>& __buffer, _Tp __value, bool __negative,
+                                                       bool __has_precision) {
+    char* __first = __insert_sign(__buffer.begin(), __negative, this->__sign);
----------------
This and almost all other function template here don't need to be defined in a header because they are only instantiated for a few known types (float/double/long double). Moving them out of the header would be beneficial for build times (esp. since you can also move includes).

================
Comment at: libcxx/include/__format/formatter_floating_point.h:126
+  static constexpr int __max_integral = 308;
+  static constexpr int __max_precision = 1074;
+  static constexpr int __max_precision_value = 4;
----------------
Mordante wrote:
> vitaut wrote:
> > You can do better than that by generating significant digits only which gives maximum precision of 767: https://www.exploringbinary.com/maximum-number-of-decimal-digits-in-binary-floating-point-numbers/. This will require replacing `to_chars` with something more efficient/lower-level and can be done later. This will also avoid dynamic memory allocation.
> I had a quick look at that article and it looks interesting, thanks! I'll have a closer look later. Even when I implement this method I still need dynamic allocation. The article doesn't mention long double only IEEE quad. But the number of digits is similar, which means there are over 10000 significant digits possible.
> (I haven't implemented this since `to_chars` casts a `long double` to a `double`. So I could remove the code for a `double`, but then I've to re-add it for `long double`.
> 
> Since `to_chars` is in the dylib it's possible make modifications to its implementation. I've considered before to have a version returning more status information; like the location of the radix point and exponent. I may pursue that at a later time. For now I much rather focus on completing `<format>` and optimizing this header for size and speed.
Leaving it as a future optimization makes sense but I suggest adding a note about not needing to generate zeros (suggested in another comment above).

================
Comment at: libcxx/include/__format/parser_std_format_spec.h:489-490
+  __flags.__zero_padding &= __flags.__alignment == _Flags::_Alignment::__default;
+  if (!__flags.__zero_padding && __flags.__alignment == _Flags::_Alignment::__default)
+    __flags.__alignment = _Flags::_Alignment::__right;
+}
----------------
Why not set alignment unconditionally? Numeric alignment is just a variation of right alignment and you already have `__zero_padding` to check for it.

================
Comment at: libcxx/test/libcxx/utilities/format/format.string/format.string.std/std_format_spec_floating_point.pass.cpp:186
+  test({.width = 2'147'483'647, .width_as_arg = false}, 10, CSTR("2147483647}"));
+  test_exception<Parser<CharT>>("The numeric value of the format-spec is too large", CSTR("2147483648"));
+  test_exception<Parser<CharT>>("The numeric value of the format-spec is too large", CSTR("5000000000"));
----------------
The error message could be improved. First why "the" format-spec and second format-spec doesn't have a numeric value. Something along the lines of "Number (is) of out of range (in a format string)" would be better. Format string can probably be omitted too because it's redundant in `format_error`.

BTW is the error format consistent with the rest of libc++, particularly is the first word normally capitalized and do they use standardese terms?

================
Comment at: libcxx/test/libcxx/utilities/format/format.string/format.string.std/std_format_spec_floating_point.pass.cpp:190
+
+  test_exception<Parser<CharT>>("End of input while parsing format-spec arg-id", CSTR("{"));
+  test_exception<Parser<CharT>>("Invalid arg-id", CSTR("{0"));
----------------
This error message can be improved too because the fact that you are parsing arg-id is an implementation detail that doesn't help user. A more helpful message would say something about unmatched `{`, for example:

```
>>> "{".format(42)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Single '{' encountered in format string
```

I'd recommend making a pass over error messages in a separate diff and making them a bit more user-friendly, possibly looking how similar errors are reported in python which does a decent job.

================
Comment at: libcxx/test/std/utilities/format/format.functions/locale-specific_form.pass.cpp:943
+  test(STR("$-1.234560P+3$$"), en_US, STR("{:$^15.6LA}"), F(-0x1.23456p3));
+  test(STR("-0001.234560P+3"), en_US, STR("{:015.6LA}"), F(-0x1.23456p3));
+
----------------
Do you test that hex digits are capitalized with `A` somewhere?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114001/new/

https://reviews.llvm.org/D114001