[libcxx-commits] [PATCH] D99091: [locale][num_get] Improve Stage 2 of string to float conversion

Wed Apr 7 09:29:51 PDT 2021

tmatheson marked an inline comment as done.
tmatheson added inline comments.

================
Comment at: libcxx/include/locale:524
+
+  if (__ct == __decimal_point) {
+    if (!__in_units)
----------------
miyuki wrote:
> I think it would be better to keep 4-space indentation for consistency with the rest of the file. See https://github.com/llvm/llvm-project/commit/7004d6664efde9d1148ed677649593f989cc6056
I agree, but the CI was failing if I didn't clang-format the patch.

================
Comment at: libcxx/include/locale:581
+    // Not '.' or __thousands_sep or '+' or '-' or 'x' or __exp or digit.
+    // Special handling for the characters in INF/INFINITY/NAN.
+    // These must appear at the start of the sequence, possibly preceeded by + or -.
----------------
miyuki wrote:
> Would it be easier to handle these special cases outside of the loop?
Yes, and I tried but there are two problems with doing so. First it requires an even bigger refactor to get it to work. But more importantly, the standard is pretty over-specified here, to the point of essentially dictating the algorithm to use: https://timsong-cpp.github.io/cppwp/n4140/facet.num.get.virtuals#3.2

Therefore while moving these special cases outside of the loop would probably make a better algorithm, it would be a deviation from the standard.

================
Comment at: libcxx/include/locale:1079
     // Stage 2
-    char_type __atoms[32];
+    char_type __atoms[this->__n_atoms_float];
     char_type __decimal_point;
----------------
Quuxplusone wrote:
> miyuki wrote:
> > Quuxplusone wrote:
> > > C++ doesn't support VLAs; you'd have to put something here that's a constant-expression.
> > > Do I understand correctly that the majority of this patch is just changing `32` and `33` to `36` and `37` respectively? Could you just do that in the simplest possible way? E.g. here
> > > ```
> > > -    char_type __atoms[32];
> > > +    char_type __atoms[36];
> > > ```
> > > That'll help focus attention on whatever details are actually important.
> > > 
> > > Meanwhile, the important change seems to be that you're adding those extra 4 characters for `"tTyY"`, so that you can parse not only `"INF"` and `"NAN"` but also `"INFINITY"` (producing `INF`). Is this required by the Standard? Why didn't we have any tests for it before now?
> > > 
> > > I tentatively suggest that you split out the `"INFINITY"` change+test into its own PR (with a summary that cites chapter and verse for why this is needed); and then let's see what remains in this PR after that.
> > I guess
> > ```
> > char_type __atoms[__num_get_base::__n_atoms_float];
> > ```
> > will work. We already have something similar on line 1089:
> > ```
> > unsigned __g[__num_get_base::__num_get_buf_sz];
> > ```
> (For the record, //I// don't want to see `char_type __atoms[__num_get_base::__n_atoms_float]`; I want to see `char_type __atoms[36];`, in a separate PR, with explanation of why we need to parse `INFINITY` as a float, and appropriate tests.)
It seemed slightly more readable with one less magic number, but I can change it. This whole section of code is pretty obscure.

String should be converted to float by the rules of strtold: https://timsong-cpp.github.io/cppwp/n4140/facet.num.get.virtuals#3.3.3
INFINITY is parsed by strtold as described here: https://en.cppreference.com/w/c/string/byte/strtof
(C11 standard, "7.22.1.3 The strtod, strtof, and strtold functions")

I don't know why there are no tests for it, but the tests in these files are pretty minimal and haven't been significantly changed since "libcxx initial import".

I will remove the INFINITY stuff from this PR.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99091/new/

https://reviews.llvm.org/D99091