[PATCH] D92290: [clangd] Factor out the heuristic resolver code into its own class

Mon Jan 18 09:36:56 PST 2021

sammccall added a comment.

Thanks a lot for doing this!
I have some thoughts on how we can give it a nice API/more clearly separate it from FindTarget/improve internal structure.

I'm not sure how we should sequence that/how much to do.
One the one hand, simply moving this code is progress! On the other hand, after splitting it the API is more relevant.
We can do certainly land this as mostly moving the code with its existing structure if you like, and I or you could try to clean up the interface later.

I would like to get your feedback on the ideas though as I spent a *long* time staring at this code today and want to make sure I'm not crazy :-)

In D92290#2504161 <https://reviews.llvm.org/D92290#2504161>, @nridge wrote:

> Change approach to factor out a HeuristicResolver class
>
> I didn't make it an optional parameter to allTargetDecls() etc. If you feel there are call sites which would benefit from //not// passing one in, we can change this.

As far as I know we should always have this available in production, so I don't think it needs to be optional in the signature.
However I think it should be *nullable* so we can test this separately, if that makes sense?

================
Comment at: clang-tools-extra/clangd/FindTarget.cpp:453
     case NestedNameSpecifier::TypeSpecWithTemplate:
-      add(QualType(resolveNestedNameSpecifierToType(NNS), 0), Flags);
+      add(QualType(Resolver->resolveNestedNameSpecifierToType(NNS), 0), Flags);
       return;
----------------
only `Identifier` needs the heuristic resolution, TypeSpec[WithTemplate] can just pull out the stored type

================
Comment at: clang-tools-extra/clangd/HeuristicResolver.cpp:114
+  }
+  if (const auto *ME = dyn_cast<MemberExpr>(E))
+    return {ME->getMemberDecl()};
----------------
We shouldn't need these cases, if we're only trying to support dependent E.
(And otherwise, there are lots of other cases we could potentially need to support here eventually)

================
Comment at: clang-tools-extra/clangd/HeuristicResolver.h:37
+
+class HeuristicResolver {
+public:
----------------
This could use some class documentation explaining the idea - basically that clang's AST generally stores only fully-resolved edges, and therefore template ASTs are missing lots of information that we can guess in common cases.

================
Comment at: clang-tools-extra/clangd/HeuristicResolver.h:37
+
+class HeuristicResolver {
+public:
----------------
sammccall wrote:
> This could use some class documentation explaining the idea - basically that clang's AST generally stores only fully-resolved edges, and therefore template ASTs are missing lots of information that we can guess in common cases.
We should have some shared contract around how cases where heuristics are not needed (i.e. non-dependent code). And given the current functions, it's not obvious what this contract should be.

On the one hand, if we attempt to handle such cases here, I'm not sure we've really separated this from FindTarget - e.g. it's hard to find a crisp definition of resolveExprToDecls that's distinct from targetDecls(). And we end up duplicating code between the two.
On the other hand, if we don't, then internal calls (like the implementation of `resolveNestedNameSpecifierToType`) are really awkward.

In either case, I think the resolution is to make sure the scope of the heuristic bits are as narrow/isolated as possible. Then if we want to handle all cases, it's manageable to do so. And if we don't, we can easily define a peer e.g. `resolveNNSToType(const NNS&, const HeuristicResolver *)` either here or in AST.h.

(Currently it appears the actual code currently handles some but not all non-dependent cases)

================
Comment at: clang-tools-extra/clangd/HeuristicResolver.h:50
+  // for a call to vector<T>::size().
+  // The name to look up is provided in the form of a factory that takes
+  // an ASTContext, because an ASTContext may be needed to obtain the
----------------
Doc seems stale

================
Comment at: clang-tools-extra/clangd/HeuristicResolver.h:55
+  std::vector<const NamedDecl *>
+  resolveDependentMember(const Type *T, DeclarationName NameFactory,
+                         llvm::function_ref<bool(const NamedDecl *ND)> Filter);
----------------
This seems like a really nice example of a heuristic lookup with a clear contract :-)

================
Comment at: clang-tools-extra/clangd/HeuristicResolver.h:56
+  resolveDependentMember(const Type *T, DeclarationName NameFactory,
+                         llvm::function_ref<bool(const NamedDecl *ND)> Filter);
+
----------------
nit: const throughout?

================
Comment at: clang-tools-extra/clangd/HeuristicResolver.h:60
+  // or more declarations that it likely references.
+  std::vector<const NamedDecl *> resolveExprToDecls(const Expr *E);
+
----------------
This is pretty vaguely defined. From reading the implementation, this can choose to return two things:

1. a ValueDecl representing the value of E
2. a TypeDecl representing the type of E

Case 1 is used by heuristic go-to-def, which never hits case 2 because it only passes certain kinds of E.
Case 1+2 are used internally by the heuristic resolver to resolve base types, and are unified by resolveDeclsToType which handles them as two cases.

This doesn't seem like a clean public API :-) It also seems to lead to some convolution internally.

--- 

I think we should distribute the work of this function into a few smaller functions with couple of flavors.

Some that deal with specific node types, extracted from resolveExprToDecls:
 - `Decl* memberExprTarget(CXXDependentMemberExpr*)`. Uses exprToType (for the base type), pointee (for arrow exprs), resolveDependentMember (the final lookup).
 - `Type* callExprType(CallExpr*)`. Uses exprToType (for the callee type).
 - `Decl* declRefExprTarget(DependentScopeDeclRefExpr*)` - this is currently a trivial wrapper for resolveDependentMember.

And some that are more generic building blocks:
 - `Type* exprType(Expr*)`, which tries to handle any expr, dispatching to the node-specific function and falling back to Expr::getType().
 - `getPointeeType(Type*)` and `resolveDependentMember(...)` as now

It's not entirely clear what should be public vs private, but one idea I quite like is: all the node-specific methods are public, and all the building blocks are private.
- This would mean adding some more node-specific methods to cover the external callers of `resolveDependentMember`, but in each case I looked at this seems to make sense.
  (This neatly deals with the slightly grungy signature of resolveDependentMember by hiding it)
- it also mostly answers the question about contracts on non-dependent code: most non-dependent node types simply can't be passed in

================
Comment at: clang-tools-extra/clangd/HeuristicResolver.h:64
+  // specifier.
+  const Type *resolveNestedNameSpecifierToType(const NestedNameSpecifier *NNS);
+
----------------
The contract is a bit fuzzy here, and both the caller and callee have to deal with NNS recursion. It could be narrowed by requiring the resolved parent type to be passed in...

================
Comment at: clang-tools-extra/clangd/ParsedAST.h:113

+  HeuristicResolver *getHeuristicResolver() const { return Resolver.get(); }
+
----------------
is this intended to be nullable? (If not, why the pointer type?)

================
Comment at: clang-tools-extra/clangd/ParsedAST.h:150
   CanonicalIncludes CanonIncludes;
+  std::unique_ptr<HeuristicResolver> Resolver;
 };
----------------
Optional<HeuristicResolver> or simply HeuristicResolver?

================
Comment at: clang-tools-extra/clangd/unittests/FindTargetTests.cpp:90
+    for (const auto &Entry :
+         allTargetDecls(N->ASTNode, AST.getHeuristicResolver()))
       ActualDecls.emplace_back(Entry.first, Entry.second);
----------------
If we really want these to be separate, I think we should be using a null resolver except for the tests that are explicitly testing heuristics resolution. (And those tests could check both with/without resolver and verify it's required...)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D92290/new/

https://reviews.llvm.org/D92290