[libcxx-commits] [PATCH] D103368: [libc++][format] Adds parser std-format-spec.

Sun Jun 27 06:42:37 PDT 2021

Mordante marked 13 inline comments as done.
Mordante added inline comments.

================
Comment at: libcxx/include/__format/parser_std_format_spec.h:132
+  // character. Validate what fmt and MSVC have implemented.
+  _CharT __fill{_CharT(' ')};
+
----------------
vitaut wrote:
> This should really be a grapheme cluster but {fmt} uses a code point at the moment (grapheme clusterization will be added later). In any case this should be a range (e.g. string_view) and not a single code unit.
My interpretation of http://eel.is/c++draft/format#string.std-2
  The fill character can be any character other than { or }.
is that this is intended to be a single character and not a Unicode grapheme cluster. I can't find a paper or defect report that intends to change the current behaviour. Was it intended to be part of P1868R2 "🦄 width: clarifying units of width and precision in std::format" ?

Is there a paper or defect report, or should I file a defect report?

I don't mind changing it, but for now there are some unclarities how to implement this feature:
Are 2 column Unicode characters allowed?
- if yes, how to do odd even formatting?
- if no, then is it a precondition or should the implementation do an width estimate?

For now I'll leave it as is, there's a `TODO FMT` to address this issue later.

================
Comment at: libcxx/include/__format/parser_std_format_spec.h:138
+    _LIBCPP_ASSERT(__begin != __end, "Precondition failure");
+    if (__begin + 1 != __end) {
+      if (__detail::__parse_alignment(*(__begin + 1), __flags)) {
----------------
vitaut wrote:
> This will have to change for the same reason.
Likewise will be done later.

================
Comment at: libcxx/include/__format/parser_std_format_spec.h:196-201
+    switch (*__begin) {
+    case _CharT('+'):
+    case _CharT('-'):
+    case _CharT(' '):
+      __throw_format_error("Format-spec sign not allowed");
+    }
----------------
vitaut wrote:
> I don't think you need to have this check because it will be caught elsewhere anyway. Format specifiers are type-specific and it generally makes more sense to say what is supported rather than what is not. Same for other `__parser_no_*` cases.
I personally like to report errors as early as possible. So for types where the sign makes no sense, for example strings, I like to report them as soon as they're found. For the other types I indeed still need post processing based on the type.

================
Comment at: libcxx/include/__format/parser_std_format_spec.h:277-279
+  if (*__r.first == _CharT(':'))
+    __throw_format_error(
+        "Format-spec arg-id shouldn't contain its own format-spec");
----------------
vitaut wrote:
> This is redundant and only adds an overhead without substantially improving diagnostics. The error message is not particularly correct either.
Removed. The `if (__r.first == __end)` validation in this function has also been removed.

================
Comment at: libcxx/include/__format/parser_std_format_spec.h:307-308
+              static_cast<_CT>(__format::__number_max))
+            __throw_format_error("Format-spec arg-id replacement "
+                                 "exceeds maximum supported value");
+          return __arg;
----------------
vitaut wrote:
> It might be better to just return a large value and let the error be reported elsewhere (when the ID is actually used). It's both more efficient and gives better diagnostic without referring to internal magic limits.
I think that can lead to errors. When `_Type` is `uint64_t` containing `UINT32_MAX + 1` it will return `1` without this test.

================
Comment at: libcxx/include/__format/parser_std_format_spec.h:373
+public:
+  uint32_t __precision : 31 {__format::__number_max};
+  uint32_t __precision_as_arg : 1 {0};
----------------
vitaut wrote:
> Note that any nonnegative `int32_t` value is a valid precision so I'm not sure `__number_max` is OK here.
This was on "the 999.999.999 maximum"-will-never-be-used-in-practice assumption. I've adjusted the code to handle every non-negative `int32_t`.

================
Comment at: libcxx/include/__format/parser_std_format_spec.h:556-558
+ * @note The standard doesn't mandate parsing to be constexpr this is an
+ * extension. https://wg21.link/P2216 will mandate the parsing happens at
+ * compile time.
----------------
vitaut wrote:
> It does now.
true :-)

================
Comment at: libcxx/include/__format/parser_std_format_spec.h:582
+   * directly after the ':'.
+   * @pre The std-format-spec is terminated by a '}'. The standard allows to
+   * terminate at __parse_ctx.end(), but the replacement string shall end with
----------------
vitaut wrote:
> This isn't correct. `std-format-spec` doesn't have to be terminated by anything. `formatter::parse` can be used to parse standalone format specs. `}` is part of `replacement-field` and should be handled there.
Yes that is now handled in the replacement field of the previous patch.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103368/new/

https://reviews.llvm.org/D103368