[libcxx-commits] [PATCH] D125606: [libc++][format] Improve string formatters

Sun Jun 5 11:35:27 PDT 2022

Mordante marked 4 inline comments as done.
Mordante added a comment.

Thanks for the review!

In D125606#3559009 <https://reviews.llvm.org/D125606#3559009>, @vitaut wrote:

> Looks great! My main suggestion is to rename `__parser` into something more appropriate (see an inline comment) and possibly separate the actual parsing logic from the parser output.

I'll investigate that direction.

================
Comment at: libcxx/include/__format/formatter_output.h:79
+/// \param __out_it    The output iterator to write to.
+/// \param __parser    The parsed formatting output settings.
+/// \param __size      The (estimated) output column width. When the elements
----------------
vitaut wrote:
> Normally parser is something that does parsing. "The parsed formatting output settings" sounds more like format specs so I'd suggest renaming accordingly and making parser return these specs.
Interesting observation. I give this some thought.

================
Comment at: libcxx/include/__format/formatter_output.h:118-131
+  auto __last = __format_spec::__detail::__estimate_column_width_fast(__str.begin(), __str.end());
+  ptrdiff_t __size = __last - __str.begin();
+  if (__size >= __parser.__width_)
+    return _VSTD::copy(__str.begin(), __str.end(), _VSTD::move(__out_it));
+
+  // Is there a non Unicode part?
+  if (__last != __str.end()) {
----------------
vitaut wrote:
> Not specific to this diff but I think it would be cleaner to fold Unicode/non-Unicode handling into `__estimate_column_width` and have a single check and copy here.
I like this suggestion, but I leave it as is in this patch.
I noticed this code doesn't do grapheme clustering at all.
I've been working on that in D126971. So it makes more sense it implement those changes in that patch.

================
Comment at: libcxx/include/__format/formatter_string.h:35
+template <class _CharT>
+_LIBCPP_HIDE_FROM_ABI auto __format_string_view(basic_string_view<_CharT> __str,
+                                                output_iterator<const _CharT&> auto __out_it,
----------------
vitaut wrote:
> What is this function for? Why not use `__write_unicode` directly?
I did that to be consistent with the other new formatters. But this one indeed can be removed.

================
Comment at: libcxx/include/__format/formatter_string.h:81
     // TODO FMT Implement these improvements.
-    if (this->__has_width_field() || this->__has_precision_field())
-      return _Base::format(__str, __ctx);
+    if (_Base::__parser_.__width_as_arg_ || _Base::__parser_.__precision_as_arg_ || _Base::__parser_.__has_width() ||
+        _Base::__parser_.__has_precision())
----------------
vitaut wrote:
> Premature optimization: you could use negative values to indicate "no width/precision" and avoid extra checks here.
That won't work. Here I need to check for both the value set or an argument used. Here I haven't resolved the formatting arguments.

However when a formatting argument used for the width can contain the value 0 it would be easier to resolve this width before validating its value.

================
Comment at: libcxx/include/__format/parser_std_format_spec.h:1528
+  bool __alternate_form_ : 1 {false};
+  bool __zero_padding_ : 1 {false};
+  bool __locale_specific_form_ : 1 {false};
----------------
vitaut wrote:
> Another option is to fold this into fill/alignment (this would require an additional alignment enumerator making it larger). Not suggesting to change, just something to consider.
I had considered that in the original parser, but it didn't work out in that design. But I don't see why it can't work in this design. I like the idea since it makes the code of setting the zero-padding alignment cleaner. During the parsing I can test whether another alignment is set, instead of having "uglier" post processing code.

Indeed the number of bits required to store the data remains the same.

Thanks for the suggestion!

================
Comment at: libcxx/include/__format/parser_std_format_spec.h:1689
+    case 'A':
+      __type_ = __type::__float_hexadecimal_upper_case;
+      break;
----------------
vitaut wrote:
> nit: maybe replace `float_hexadecimal` with `hexfloat` for consistency with the naming in the standard (e.g. https://en.cppreference.com/w/cpp/io/manip/fixed)? The other names seem to be already consistent.
Good point.

================
Comment at: libcxx/include/__format/parser_std_format_spec.h:1750-1751
+    __width_ = __format_spec::__substitute_arg_id(__arg);
+    if (__width_ == 0)
+      __throw_format_error("A format-spec width field replacement should have a positive value");
+  }
----------------
vitaut wrote:
> This is a weird check. I think we should open an LWG issue to allow zero dynamic width and remove this unnecessary check.
Currently width is specified as
```
width:
    positive-integer
    { arg-idopt }
```

do you want to change that also to `nonnegative-integer`? When not it's the question what the width of `0` should do.
Note that `__format_spec::__substitute_arg_id(__arg);` already throws on negative values.

Another option would be to give the minimum as an argument to `__substitute_arg_id` so there it can validate the proper boundary.

================
Comment at: libcxx/include/__format/parser_std_format_spec.h:1757
+
+    __precision_as_arg_ = false;
+    __precision_ = __format_spec::__substitute_arg_id(__arg);
----------------
vitaut wrote:
> nit: why assign `false` here but 0 to `__width_as_arg_` in `__substitute_width_arg_id` above? Make this consistent?
I originally used `0` and `1` but switched to `false` and `true` later. It seems I missed one place.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125606/new/

https://reviews.llvm.org/D125606