[libcxx-commits] [PATCH] D114001: [libc++][format] Adds formatter floating-point.

Mon Dec 27 10:33:13 PST 2021

Mordante planned changes to this revision.
Mordante marked 10 inline comments as done.
Mordante added a comment.

Thanks both your reviews!

================
Comment at: libcxx/benchmarks/formatter_float.bench.cpp:91-92
+    // std::numeric_limits<F>::max() - -std::numeric_limits<F>::max()
+    std::uniform_real_distribution<F> distribution(-std::numeric_limits<F>::max() / F(2.1),
+                                                   std::numeric_limits<F>::max() / F(2.1));
+
----------------
vitaut wrote:
> I am not sure if uniform distribution over all FP numbers is very meaningful for benchmarking. The perf of FP formatting methods often depends on the number of significant digits which for these inputs will likely be leaning towards the maximum. A better way of generating inputs can be found in https://github.com/miloyip/dtoa-benchmark.
I'll have a look at this later.

================
Comment at: libcxx/benchmarks/formatter_float.bench.cpp:200
+struct Precision<PrecisionE::Small> {
+  static constexpr const char* fmt = ".6";
+};
----------------
vitaut wrote:
> Note that 6 is the default for fixed, general and scientific so this test is effectively the same as None on these formats.
Good point.

================
Comment at: libcxx/include/__format/formatter.h:209
+                                   __format_spec::_Flags::_Alignment __alignment, const _CharT* __exponent,
+                                   size_t __trailing_zero) -> decltype(__out_it) {
+  _LIBCPP_ASSERT(__first <= __last, "Not a valid range");
----------------
vitaut wrote:
> nit: I'd suggest renaming `__trailing_zero` to something reflecting that it's **the number of** trailing zeros, e.g. `__num_trailing_zeros` or at least `__trailing_zeros`. Right now it looks like a flag indicating whether to write trailing zeros.
Good point, I'll use `__num_trailing_zeros` as suggested.

================
Comment at: libcxx/include/__format/formatter_floating_point.h:27
+#include <__format/formatter.h>
+#include <__format/formatter_integral.h>
+#include <__utility/move.h>
----------------
vitaut wrote:
> What is this for?
It uses some parts in this header. I probably want to refactor this later.
(When making the formatter's format functions const members and address the fill character allowing a code unit or grapheme cluster I want to look at a slightly different approach for the parser. But it depends on the outcome of the fill character's LWG issue.)

================
Comment at: libcxx/include/__format/formatter_floating_point.h:50
+// If the compiler has no concepts support, the format header will be disabled.
+// Without concepts support enable_if needs to be used and that too much effort
+// to support compilers with partial C++20 support.
----------------
ashermancinelli wrote:
> 
I'll leave this one, this block is copy-pasted several times. I actually hope we soon only support compilers with proper concepts support so I can remove this guard from a lot of files.

================
Comment at: libcxx/include/__format/formatter_floating_point.h:126
+  static constexpr int __max_integral = 308;
+  static constexpr int __max_precision = 1074;
+  static constexpr int __max_precision_value = 4;
----------------
vitaut wrote:
> You can do better than that by generating significant digits only which gives maximum precision of 767: https://www.exploringbinary.com/maximum-number-of-decimal-digits-in-binary-floating-point-numbers/. This will require replacing `to_chars` with something more efficient/lower-level and can be done later. This will also avoid dynamic memory allocation.
I had a quick look at that article and it looks interesting, thanks! I'll have a closer look later. Even when I implement this method I still need dynamic allocation. The article doesn't mention long double only IEEE quad. But the number of digits is similar, which means there are over 10000 significant digits possible.
(I haven't implemented this since `to_chars` casts a `long double` to a `double`. So I could remove the code for a `double`, but then I've to re-add it for `long double`.

Since `to_chars` is in the dylib it's possible make modifications to its implementation. I've considered before to have a version returning more status information; like the location of the radix point and exponent. I may pursue that at a later time. For now I much rather focus on completing `<format>` and optimizing this header for size and speed.

================
Comment at: libcxx/include/__format/formatter_floating_point.h:238-243
+  __result.__exponent = __find_exponent(__result.__integral, __result.__last);
+
+  // Constrains:
+  // - There's at least one decimal digit before the radix point.
+  // - The radix point, when present, is placed before the exponent.
+  __result.__radix_point = _VSTD::find(__result.__integral + 1, __result.__exponent, '.');
----------------
vitaut wrote:
> Switching away from `to_chars` will also get rid of this reparsing overhead.
I agree, but that will be quite a bit of effort. 

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114001/new/

https://reviews.llvm.org/D114001