[PATCH] D66751: [clangd] Add targetDecl(), which determines what declaration an AST node refers to.

Tue Aug 27 06:37:13 PDT 2019

sammccall added a comment.

First a wall of text to reply to the general comments. Let's chat if this doesn't make sense.

In D66751#1646545 <https://reviews.llvm.org/D66751#1646545>, @ilya-biryukov wrote:

> Mostly LGTM, thanks! Just a few clarifying questions and suggestions
>
> This API allows us to get a target declaration for a reference, but does not help with finding the source locations for those references. Do you plan to put this into the API? Have this as a separate function?

Yes. I actually ripped this out of the patch, because it wasn't related enough and the patch is large.

To restate the relationship between these functions:
Many operations ultimately want to join source location and decl, e.g. gtd is location -> node -> decl, xrefs is decl -> node -> location, indexing is node -> (location, decl).
So providing node->decl and node->location primitives is sufficient. If you want to go in the other direction (decl->node or location->node) then of course you need to traverse and/or build an index.

With that said, there are several legitimate node->location models, and I think this is a good reason to to keep that decoupled from node->decl (testability is another). In particular:

the "flat" model
================

says that each node may have a token which is its "handle", and each token can be the handle for (refer to) only one node.

  int a = X::y;
  iii a   X  y

What is the range for a node? Its handle token.
What is the node for a range? The unique handle token covered by the range.

Strengths: accurate, precise triggering on node names. simplicity (conforms to user's view of the code)
Weaknesses: multi-token ranges (selections), implicit nodes (they don't get to have locations), requires specific per-node implementation
Use case: go-to-definition/xrefs triggering, xrefs results (nodes referenced by name)

the "hierarchical" model
========================

says that each node owns a range of tokens, and these nest.

  int a = X::y;
          X
          XXX
  iii     yyyy
  aaaaaaaaaaaa

What is the range for a node? Its full range.
What is the node for a range? The innermost node that entirely contains the selection.

Strengths: exposing underlying structure/hierarchy of the code, can be (mostly) implemented generically
Weaknesses: loose triggering that doesn't distinguish e.g. node names from qualifiers
Use case: extract variable, structured selection, xrefs results (nodes referenced implicitly)

---

I think we're going to need/want both. The hierarchical model corresponds to SelectionTree. The flat model is the thing we need to build. libIndex looks more like the flat model than the hierarchical one, but has elements of both I think.
(If you think the above explanation is useful, we can check something like it in somewhere)

> E.g. finding a source location of field for DeclRefExpr produced for MyBase::field seems to be the same amount of work (in general case) as finding the Decl* of field.

For the hierarchical model it's easy, this is one of the few methods DynTypeNode actually has :-)
For the flat model: it's not trivial, but is less work because while you do have to dispatch over all node types, you don't have general multi-hop/graph cases (I think).
Usually the "handle" tokens are well-supported by the AST because they're precisely the ones that diagnostics want to report. So in the DeclRefExpr case, DeclRefExpr::getLoc() returns the right thing.

In D66751#1646675 <https://reviews.llvm.org/D66751#1646675>, @kadircet wrote:

> I agree with Ilya's concerns on SourceLocations, even though most of the time user will have the source locations available in the dyntyped node, they might need some traversals to fetch the exact range especially in cases of nested name specifiers.

I think if you want the (full) range, this is easy as mentioned above: getSourceRange() (available on DynTypedNode, but also on each specific node type).
If you want just the "handle" token for the flat model, NNS isn't a particularly hard case: it's basically a linked list. Each node claims one component of the NNS as its "handle". If you want to treat NNS as part of the parent (so pointing at X in X::y counts as a ref to Y) then I think you're better off with the hierarchical model.

> It would be nice to provide a referencing range in addition to the decls.

Concretely, can you give an example of which node and what you'd want it to point to? If it's just the single unqualified-name token, I agree and would like to add it (as a separate patch). If it's the full range, I think that's getSourceRange(). Is it something else?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D66751/new/

https://reviews.llvm.org/D66751