[PATCH] D45753: Lift JSON library from clang-tools-extra/clangd to llvm/Support.

Fri Apr 27 14:13:49 PDT 2018

sammccall added inline comments.

================
Comment at: include/llvm/Support/JSON.h:355
+
+  // "cheating" move-constructor for moving from initializer_list.
+  ObjectKey(const ObjectKey &&V) {
----------------
Meinersbur wrote:
> sammccall wrote:
> > Meinersbur wrote:
> > > Can you elaborate on why this is needed? AFAIK `std::initializer_list`s are not meant to be moved from.
> > I was able to mitigate this, eliminating the const-rvalue-reference constructors. (This optimization isn't important for ObjectKey, and for Object I friended the relevant classes instead.)
> > 
> > The reasoning here is roughly:
> >  - we need some syntax to support many KV pairs for an object, or elements in an array
> >  - function syntax fails because it doesn't format well in long-list cases. Clang-format does a better job if lists are braced lists, in particular it offers you the ability to force one-per-line with a trailing comma in the list.
> >  - a variadic constructor fails because this must be templated on the arg type, which means args can't be braced list expressions themselves, as those do not have a deducible type. This would hurt map-like object-literal syntax...
> >  - so we're left with the `std::initializer_list` constructor as the way to pass variable numbers of arguments, in a way that formats nicely, and allows them to be coerced to a chosen type. However `initializer_list` acts like a container of `const<T>`, which would mean naively json::Value{{{{{1}}}}} would result in a deep copy at every level of nesting. Fortunately the standard spells out enough of how the contents of init-lists are constructed that moving the data out of them seems well-defined.
> > 
> > I'm not sure how much of this stuff belongs in the comments here - it's more design doc than user guide.
> `std::initializer_list` acting like a container of const elements is probably for a reason. I'd prefer no such hacks, but also see that endless copying of elements might justify cheating.
> 
> [[ http://en.cppreference.com/w/cpp/utility/initializer_list | cppreference.com ]] mentions that its elements must be copy-initialized, but only since C++17.
> 
> does anyone else have an opinion on this?
I would also prefer no such hacks, but I can't see a way to get good syntax without the recursive copy, and without the hacks :(
Any ideas? Also interested in hearing more opinions on whether this is too hairy to rely on.

That said, I believe this is valid, per all relevant versions of the standard.

> cppreference.com mentions that its elements must be copy-initialized, but only since C++17.
I think I'm missing your point here - can you explain why this is good or bad, and what the implication is? It also seems to be a cppreference mistake, the copy-initialized requirement is older.

C++11 says
> as if the implementation allocated an array of N elements of type E [...] Each element of that array is copy-initialized
i.e. it creates a `T[N]`. We could probably even `const_cast`!

C++14 says
> as if the implementation allocated a temporary array of N elements of type **const E** [...]. Each element of that array is copy-initialized [...] The implementation is free to allocate the array in read-only memory **if an explicit array with the same initializer could be so allocated**
(emphasis mine). So now the type changes to `const T[N]` (so `const_cast` would be invalid) but mutating mutable members of const objects is allowed, so read-only memory can't be used.

The C++17 language is more obscure
> as if the implementation generated and materialized (7.4) a prvalue of type “array of N const E” [...] Each element of that array is copy-initialized
but seems to be the same for these purposes.

================
Comment at: lib/Support/JSON.cpp:560
+    case T_Number:
+      OS << format("%g", as<double>());
+      break;
----------------
simon_tatham wrote:
> sammccall wrote:
> > simon_tatham wrote:
> > > Could this number formatting be changed? The default %g loses precision – you don't even get enough information to exactly reconstruct the same double you started with.
> > > 
> > > Also, over in D46054 I'm working on a JSON back end for TableGen, for which I'd find it useful to be able to pass an arbitrary 64-bit integer through this system and still have the full 64 bits of integer value visible in the JSON output file, for the benefit of JSON consumers (e.g. Python `json.load`) that go above the call of duty in returning it as an integer without rounding it to the nearest representable double.
> > > 
> > > So, would it be possible to have some method of constructing a json::Value that formats as a 64-bit integer literal?
> > > Could this number formatting be changed? The default %g loses precision – you don't even get enough information to exactly reconstruct the same double you started with.
> > 
> > Yes, though is there an existing aware of round-trip safe double formatter in llvm?
> > I suspect this only actually matters when the values are integers, so we should consider your second suggestion first :-)
> > 
> > > I'd find it useful to be able to pass an arbitrary 64-bit integer through this system and still have the full 64 bits of integer value visible in the JSON output file, for the benefit of JSON consumers (e.g. Python json.load) that go above the call of duty in returning it as an integer without rounding it to the nearest representable double.
> > 
> > What about this design:
> > - Internally, a numeric value can be an integer or a double. i.e we split the internal `ValueType` `T_Number` into two, `T_Integer` and `T_Double`. public `Kind` remains unchanged.
> > - when constructing, you get one or the other depending on the static type
> > - when parsing, you get integer unless it has a nonzero decimal part or is out-of-range.
> > - `asDouble()` always succeeds
> > - `asInteger()` succeeds if the underlying value is integer or if it's a double that can be exactly represented as `int64_t` (same as now)
> > - when serializing, you get `%g` for double and the usual representation for integers
> > 
> > Open questions:
> > - is 1.2e3 a double or an integer? I kind of want the former, which complicates our heuristic.
> > - `int64_t` leaves anyone who wants `uint64_t` out in the cold. But adding more options for types is going to lead to madness. Can we live with this limitation?
> > 
> > If this sounds good I can start on the changes, but I'd like to defer adding new features to another patch if that's OK. This one is largely moving mostly-battleworn code from clangd, and new features need closer review of the implementation.
> That design certainly sounds as if it would do what I need, and a great deal more besides.
> 
> Another option that would be fine for me personally would be to have a means of constructing a `ValueType` called, say, `T_Custom`, which internally holds a string value, and serializes as exactly that string, unquoted. I could imagine that being used for other unusual purposes as well, such as controlling which of `\uXXXX` and UTF-8 was used to represent a non-ASCII character in a string literal.
> 
> (And that possibility is simple enough that I could add it myself as part of //my// patch.)
D46209 adds int64 support, and fixes use of %g to retain full precision.

`T_Custom` is a cool idea and I definitely don't want to rule it out, but large integers in particular seems like something common that should "just work" if possible.
(It's also unclear how a `T_Custom` could solve the problem on the *parse* side, which is nice to have)

Repository:
  rL LLVM

https://reviews.llvm.org/D45753