[libcxx-commits] [PATCH] D103368: [libc++][format] Adds parser std-format-spec.

Sun Jul 11 09:01:02 PDT 2021

vitaut requested changes to this revision.
vitaut added inline comments.
This revision now requires changes to proceed.

================
Comment at: libcxx/include/__format/parser_std_format_spec.h:100
+    __scientific_upper_case,
+    __fixed,
+    __general_lower_case,
----------------
Fixed should have lower and uppercase variants too (`f` and `F`).

================
Comment at: libcxx/include/__format/parser_std_format_spec.h:360-362
+    if (__width == 0)
+      __throw_format_error(
+          "A format-spec width field replacement should have a positive value");
----------------
I suggest moving this check into `__substitute_arg_id` which will allow having one check instead of two for the common case of signed width.

================
Comment at: libcxx/include/__format/parser_std_format_spec.h:388
+  [[nodiscard]] constexpr bool __has_precision_field() const noexcept {
+    return __precision_as_arg == 0 || __precision != __format::__number_max;
+  }
----------------
So if the user passes a precision of `INT32_MAX` it will be treated as "no precision" (and asserted on in debug mode) which is somewhat arbitrary and may lead to the same problems I mentioned earlier. To avoid this you could use two values above `INT32_MAX` (e.g. `INT32_MAX + 1` and `INT32_MAX + 2` to denote "no precision" and `__precision_as_arg` respectively.

================
Comment at: libcxx/include/__format/parser_std_format_spec.h:132
+  // character. Validate what fmt and MSVC have implemented.
+  _CharT __fill{_CharT(' ')};
+
----------------
Mordante wrote:
> vitaut wrote:
> > This should really be a grapheme cluster but {fmt} uses a code point at the moment (grapheme clusterization will be added later). In any case this should be a range (e.g. string_view) and not a single code unit.
> My interpretation of http://eel.is/c++draft/format#string.std-2
>   The fill character can be any character other than { or }.
> is that this is intended to be a single character and not a Unicode grapheme cluster. I can't find a paper or defect report that intends to change the current behaviour. Was it intended to be part of P1868R2 "🦄 width: clarifying units of width and precision in std::format" ?
> 
> Is there a paper or defect report, or should I file a defect report?
> 
> I don't mind changing it, but for now there are some unclarities how to implement this feature:
> Are 2 column Unicode characters allowed?
> - if yes, how to do odd even formatting?
> - if no, then is it a precondition or should the implementation do an width estimate?
> 
> For now I'll leave it as is, there's a `TODO FMT` to address this issue later.
> should I file a defect report?

Please do open a LWG issue. The "character" term is ambiguous but if interpreted in the C++ sense the current wording is clearly wrong because you cannot even use box drawing characters (in the Unicode sense). We can discuss the details in the LWG issue.

================
Comment at: libcxx/include/__format/parser_std_format_spec.h:196-201
+    switch (*__begin) {
+    case _CharT('+'):
+    case _CharT('-'):
+    case _CharT(' '):
+      __throw_format_error("Format-spec sign not allowed");
+    }
----------------
Mordante wrote:
> vitaut wrote:
> > I don't think you need to have this check because it will be caught elsewhere anyway. Format specifiers are type-specific and it generally makes more sense to say what is supported rather than what is not. Same for other `__parser_no_*` cases.
> I personally like to report errors as early as possible. So for types where the sign makes no sense, for example strings, I like to report them as soon as they're found. For the other types I indeed still need post processing based on the type.
In addition to being conceptually wrong having these checks penalizes many common cases. I strongly suggest making all `__parser_no_*` cases noops. The only reason the standard wording uses a single grammar for built-in and string types is simplicity but in reality these are separate per-type grammars and should be treated as such. Moreover, those unnecessary checks won't apply to chrono and UDTs potentially adding more confusion.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103368/new/

https://reviews.llvm.org/D103368