[libcxx-commits] [PATCH] D99091: [locale][num_get] Improve Stage 2 of string to float conversion
Tomas Matheson via Phabricator via libcxx-commits
libcxx-commits at lists.llvm.org
Wed Apr 7 09:29:51 PDT 2021
tmatheson marked an inline comment as done.
tmatheson added inline comments.
================
Comment at: libcxx/include/locale:524
+
+ if (__ct == __decimal_point) {
+ if (!__in_units)
----------------
miyuki wrote:
> I think it would be better to keep 4-space indentation for consistency with the rest of the file. See https://github.com/llvm/llvm-project/commit/7004d6664efde9d1148ed677649593f989cc6056
I agree, but the CI was failing if I didn't clang-format the patch.
================
Comment at: libcxx/include/locale:581
+ // Not '.' or __thousands_sep or '+' or '-' or 'x' or __exp or digit.
+ // Special handling for the characters in INF/INFINITY/NAN.
+ // These must appear at the start of the sequence, possibly preceeded by + or -.
----------------
miyuki wrote:
> Would it be easier to handle these special cases outside of the loop?
Yes, and I tried but there are two problems with doing so. First it requires an even bigger refactor to get it to work. But more importantly, the standard is pretty over-specified here, to the point of essentially dictating the algorithm to use: https://timsong-cpp.github.io/cppwp/n4140/facet.num.get.virtuals#3.2
Therefore while moving these special cases outside of the loop would probably make a better algorithm, it would be a deviation from the standard.
================
Comment at: libcxx/include/locale:1079
// Stage 2
- char_type __atoms[32];
+ char_type __atoms[this->__n_atoms_float];
char_type __decimal_point;
----------------
Quuxplusone wrote:
> miyuki wrote:
> > Quuxplusone wrote:
> > > C++ doesn't support VLAs; you'd have to put something here that's a constant-expression.
> > > Do I understand correctly that the majority of this patch is just changing `32` and `33` to `36` and `37` respectively? Could you just do that in the simplest possible way? E.g. here
> > > ```
> > > - char_type __atoms[32];
> > > + char_type __atoms[36];
> > > ```
> > > That'll help focus attention on whatever details are actually important.
> > >
> > > Meanwhile, the important change seems to be that you're adding those extra 4 characters for `"tTyY"`, so that you can parse not only `"INF"` and `"NAN"` but also `"INFINITY"` (producing `INF`). Is this required by the Standard? Why didn't we have any tests for it before now?
> > >
> > > I tentatively suggest that you split out the `"INFINITY"` change+test into its own PR (with a summary that cites chapter and verse for why this is needed); and then let's see what remains in this PR after that.
> > I guess
> > ```
> > char_type __atoms[__num_get_base::__n_atoms_float];
> > ```
> > will work. We already have something similar on line 1089:
> > ```
> > unsigned __g[__num_get_base::__num_get_buf_sz];
> > ```
> (For the record, //I// don't want to see `char_type __atoms[__num_get_base::__n_atoms_float]`; I want to see `char_type __atoms[36];`, in a separate PR, with explanation of why we need to parse `INFINITY` as a float, and appropriate tests.)
It seemed slightly more readable with one less magic number, but I can change it. This whole section of code is pretty obscure.
String should be converted to float by the rules of strtold: https://timsong-cpp.github.io/cppwp/n4140/facet.num.get.virtuals#3.3.3
INFINITY is parsed by strtold as described here: https://en.cppreference.com/w/c/string/byte/strtof
(C11 standard, "7.22.1.3 The strtod, strtof, and strtold functions")
I don't know why there are no tests for it, but the tests in these files are pretty minimal and haven't been significantly changed since "libcxx initial import".
I will remove the INFINITY stuff from this PR.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D99091/new/
https://reviews.llvm.org/D99091
More information about the libcxx-commits
mailing list