[libcxx-commits] [PATCH] D103368: [libc++][format] Adds parser std-format-spec.

Wed Jul 14 11:12:38 PDT 2021

Mordante marked 6 inline comments as done.
Mordante added inline comments.

================
Comment at: libcxx/include/__format/parser_std_format_spec.h:100
+    __scientific_upper_case,
+    __fixed,
+    __general_lower_case,
----------------
vitaut wrote:
> Mordante wrote:
> > vitaut wrote:
> > > Fixed should have lower and uppercase variants too (`f` and `F`).
> > The upper and lower case are formatted the same. So the the type function maps both `f` and `F` to fixed.
> Not quite, NaN and infinity are formatted differently.
I forgot about these cases.

================
Comment at: libcxx/include/__format/parser_std_format_spec.h:360-362
+    if (__width == 0)
+      __throw_format_error(
+          "A format-spec width field replacement should have a positive value");
----------------
vitaut wrote:
> I suggest moving this check into `__substitute_arg_id` which will allow having one check instead of two for the common case of signed width.
`__substitute_arg_id` is also used to substitute the precision. A precision of `0` is valid. I'll give it some thought, but at the moment I don't see a good alternative.

================
Comment at: libcxx/include/__format/parser_std_format_spec.h:388
+  [[nodiscard]] constexpr bool __has_precision_field() const noexcept {
+    return __precision_as_arg == 0 || __precision != __format::__number_max;
+  }
----------------
vitaut wrote:
> Mordante wrote:
> > vitaut wrote:
> > > So if the user passes a precision of `INT32_MAX` it will be treated as "no precision" (and asserted on in debug mode) which is somewhat arbitrary and may lead to the same problems I mentioned earlier. To avoid this you could use two values above `INT32_MAX` (e.g. `INT32_MAX + 1` and `INT32_MAX + 2` to denote "no precision" and `__precision_as_arg` respectively.
> > I guess I need more comment for this code ;-)
> > When the format-spec has a value `__precision_as_arg == 0`.
> > When the format-spec has an arg-id all arg-id's except for `INT32_MAX` are a valid arg-id. If the arg-id gets resolved to `INT32_MAX` this value is considered a limit.
> In other words `INT32_MAX` is already handled correctly with the current approach or have I misunderstood your comment?
Yes `INT32_MAX` is currently handled correctly.

================
Comment at: libcxx/include/__format/parser_std_format_spec.h:196-201
+    switch (*__begin) {
+    case _CharT('+'):
+    case _CharT('-'):
+    case _CharT(' '):
+      __throw_format_error("Format-spec sign not allowed");
+    }
----------------
vitaut wrote:
> Mordante wrote:
> > vitaut wrote:
> > > I don't think you need to have this check because it will be caught elsewhere anyway. Format specifiers are type-specific and it generally makes more sense to say what is supported rather than what is not. Same for other `__parser_no_*` cases.
> > I personally like to report errors as early as possible. So for types where the sign makes no sense, for example strings, I like to report them as soon as they're found. For the other types I indeed still need post processing based on the type.
> In addition to being conceptually wrong having these checks penalizes many common cases. I strongly suggest making all `__parser_no_*` cases noops. The only reason the standard wording uses a single grammar for built-in and string types is simplicity but in reality these are separate per-type grammars and should be treated as such. Moreover, those unnecessary checks won't apply to chrono and UDTs potentially adding more confusion.
Thanks for the additional explanation. I now see what you meant before. For now I'll make them noops. I'll also add a TODO because with this new insight I might want to look at a cleaner solution later.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103368/new/

https://reviews.llvm.org/D103368