[PATCH] D40485: [clangd] Introduced a Context that stores implicit data

Wed Dec 6 07:18:30 PST 2017

sammccall added a comment.

There's a couple of points where we might be a bit stuck on complexity vs X tradeoffs (map + contextbuilder, and some of the convenience APIs).

Just want to mention: if I find these things complex, it doesn't mean I think they're bad code, or even that you should agree they're complex! Happy to pull in a third opinion if you think we can't or shouldn't live without them.

================
Comment at: clangd/Context.h:11
+// Context for storing and retrieving implicit data. Useful for passing implicit
+// parameters on a per-request basis.
+//
----------------
This could use a bit more I think, e.g.

    A context is an immutable container for per-request data that must be propagated through layers that don't care about it.
    An example is a request ID that we may want to use when logging.

    Conceptually, a context is a heterogeneous map<Key<T>, T>. Each key has 
    an associated value type, which allows the map to be typesafe.

    You can't add data to an existing context, instead you create a new immutable context derived from it with extra data added. When you retrieve data, the context will walk up the parent chain until the key is found.

    Contexts should be:
     - passed by reference when calling synchronous functions
     - passed by value (move) when calling asynchronous functions. The callback will receive the context again.
     - copied only when 'forking' an asynchronous computation that we don't wait for

Some of this is on the class comment - this seems fine but the Context class should then be the first thing in the file.

================
Comment at: clangd/Context.h:25
+
+class ContextData {
+public:
----------------
IIUC, the only reason we expose separate `Context` and `ContextData` types is to give things like Span a stable reference to hold onto (`ContextData*`).

Could we use `Context` instead (a copy)? Then we incur `shared_ptr` overhead for span creation, but greatly simplify the interface.
If we want to avoid the overhead, the Span could maybe grab whatever it needs out of the context at construction time, for use in the destructor.

Either way, we should make sure `Context` is front-and-center in this header, and I hope we can hide ContextData entirely.

================
Comment at: clangd/Context.h:32
+  /// specified for \p Key, return null.
+  template <class Type> Type *get(Key<Type> &Key) const {
+    if (auto Val = Data.get(Key))
----------------
nit: const Key&, and below

================
Comment at: clangd/Context.h:32
+  /// specified for \p Key, return null.
+  template <class Type> Type *get(Key<Type> &Key) const {
+    if (auto Val = Data.get(Key))
----------------
sammccall wrote:
> nit: const Key&, and below
this isn't usually how we define const-correctness for containers.
A const method shouldn't return a mutable reference to the contained object.

================
Comment at: clangd/Context.h:42
+  /// Must not be called for keys that are not in the map.
+  template <class Type> Type &getExisting(Key<Type> &Key) const {
+    auto Val = get(Key);
----------------
I'd prefer the standard name `at` for this function (assert vs throw being pretty minor I think), but YMMV

================
Comment at: clangd/Context.h:49
+
+  /// A helper to get a string value as StringRef. Returns empty StringRef if \p
+  /// StrKey does not exist in a map.
----------------
This is unused - remove until we need it?

================
Comment at: clangd/Context.h:61
+  /// exist in a map.
+  template <class Type> Type *getPtr(Key<Type *> &Key) const {
+    if (auto Val = get(Key))
----------------
This is only used where getExisting() would be better - remove until we need it?

================
Comment at: clangd/Context.h:68
+private:
+  // We need to make sure Parent is destroyed after Data. The order of members
+  // is important.
----------------
Nit: is destroyed after --> outlives?

If you're going to repeat that this is important, then say why :-)

================
Comment at: clangd/Context.h:82
+/// used as parents for some other Contexts.
+class Context {
+public:
----------------
If we're going to define this as a real class (rather than use/inherit `shared_ptr`), we should really be able to put the API on it directly (rather than require the user to `->`) 

================
Comment at: clangd/Context.h:96
+
+  ContextBuilder derive();
+
----------------
If we discourage copies, you probably want a moving `ContextBuilder derive() &&` overload

================
Comment at: clangd/Context.h:96
+
+  ContextBuilder derive();
+
----------------
sammccall wrote:
> If we discourage copies, you probably want a moving `ContextBuilder derive() &&` overload
nit: const

================
Comment at: clangd/Context.h:139
+/// that require a Context when no explicit Context is not available.
+Context &emptyCtx();
+
----------------
nit: how do you feel about Context::empty()? it ties the name to the type more clearly, I think.

================
Comment at: clangd/Context.h:139
+/// that require a Context when no explicit Context is not available.
+Context &emptyCtx();
+
----------------
sammccall wrote:
> nit: how do you feel about Context::empty()? it ties the name to the type more clearly, I think.
OOC: why reference and not a value?

================
Comment at: clangd/Context.h:142
+/// Creates a ContextBuilder with a null parent.
+ContextBuilder buildCtx();
+
----------------
I think we should spell this `emptyCtx().derive()`.
It should be pretty rare to derive from the empty context in production code - and conceptually it's no different than any other use of the empty context, I think.

================
Comment at: clangd/Context.h:65
+  Context *Parent;
+  TypedValueMap Data;
+};
----------------
ilya-biryukov wrote:
> sammccall wrote:
> > We add complexity here (implementation and conceptual) to allow multiple properties to be set at the same level (vs having a key and an AnyStorage and making Context a linked list).
> > Is this for performance? I'm not convinced it'll actually be faster for our workloads, or that it matters.
> Conceptually, a `Context` is more convenient to use when it stores multiple values. This allows to put a bunch of things and assign meaning to `Context` (i.e., a `Context` for processing a single LSP request, global context). If `Context`s were a linked list, the intermediate `Context`s would be hard to assign the meaning to.
> 
> That being said, storage strategy for `Context`s is an implementation detail and could be changed anytime. I don't have big preferences here, but I think that storing a linked list of maps has, in general, a better performance than storing a linked list.
> And given that it's already there, I'd leave it this way.
With the new shared_ptr semantics:

     Context D = move(C).derive(K1, V1).derive(K2, V2);

Is just as meaningful as

    Context D = move(C).derive().add(K1, V1).add(K2, V2);

Yeah, the list of maps in an implementation detail. It's one that comes with a bunch of complexity (`ContextBuilder` and most of `TypedValueMap`). It really doesn't seem to buy us anything (the performance is both uninteresting and seems likely to be worse in this specific case with very few entries). 

================
Comment at: clangd/TypedValueMap.h:1
+//===--- TypedValueMap.h - Type-safe heterogenous key-value map -*- C++-*-===//
+//
----------------
ilya-biryukov wrote:
> ilya-biryukov wrote:
> > sammccall wrote:
> > > This might be doing a little more than it needs to.
> > > 
> > > Do we need the ability to have multiple values of the same type?
> > > If request ID is an int, needing to do `struct RequestID { ID int };` doesn't seem like much of a burden, and simplifies the semantics here.
> > > 
> > > And for cases like Logger where the type carries the semantics, keying by type is clearer.
> > Are `Key<>` types confusing? I really like the fact that I don't have to specify type information and it is carried in the `Key<Type>` to the `get`/`emplace` methods, i.e. I don't have to specify template arguments there.
> > I don't see how `struct RequestId { ID int };` is clearer or shorter than `Key<int> RequestID;`. Again, are `Key<Type>` a confusing concept in the first place?
> Unfortunately, `typeid` does not with `-fno-rtti`, so there does not seem to be a way to significantly simplify the implementation (i.e. we'll need some form of `Key<Type>` anyway, at least internally).
> Therefore I'd keep the current interface. Does that sound ok?
Yes, I find `Key<T>` confusing compared to a type-based interface. I can live with  it though - it's certainly conceptually nicer.

Repository:
  rCTE Clang Tools Extra

https://reviews.llvm.org/D40485