[PATCH] D92290: [clangd] Factor out the heuristic resolver code into its own class

Fri Feb 12 08:42:33 PST 2021

sammccall added a comment.

Sorry for the looong delay getting back to this :-\

I do think this should be nullable (and const-friendly) but otherwise this is good to land.

================
Comment at: clang-tools-extra/clangd/HeuristicResolver.h:73
+  std::vector<const NamedDecl *>
+  resolveDependentMember(const Type *T, DeclarationName NameFactory,
+                         llvm::function_ref<bool(const NamedDecl *ND)> Filter);
----------------
nit: NameFactory is a weird name now

================
Comment at: clang-tools-extra/clangd/HeuristicResolver.h:60
+  // or more declarations that it likely references.
+  std::vector<const NamedDecl *> resolveExprToDecls(const Expr *E);
+
----------------
nridge wrote:
> sammccall wrote:
> > This is pretty vaguely defined. From reading the implementation, this can choose to return two things:
> > 
> > 1. a ValueDecl representing the value of E
> > 2. a TypeDecl representing the type of E
> > 
> > Case 1 is used by heuristic go-to-def, which never hits case 2 because it only passes certain kinds of E.
> > Case 1+2 are used internally by the heuristic resolver to resolve base types, and are unified by resolveDeclsToType which handles them as two cases.
> > 
> > This doesn't seem like a clean public API :-) It also seems to lead to some convolution internally.
> > 
> > --- 
> > 
> > I think we should distribute the work of this function into a few smaller functions with couple of flavors.
> > 
> > Some that deal with specific node types, extracted from resolveExprToDecls:
> >  - `Decl* memberExprTarget(CXXDependentMemberExpr*)`. Uses exprToType (for the base type), pointee (for arrow exprs), resolveDependentMember (the final lookup).
> >  - `Type* callExprType(CallExpr*)`. Uses exprToType (for the callee type).
> >  - `Decl* declRefExprTarget(DependentScopeDeclRefExpr*)` - this is currently a trivial wrapper for resolveDependentMember.
> > 
> > And some that are more generic building blocks:
> >  - `Type* exprType(Expr*)`, which tries to handle any expr, dispatching to the node-specific function and falling back to Expr::getType().
> >  - `getPointeeType(Type*)` and `resolveDependentMember(...)` as now
> > 
> > It's not entirely clear what should be public vs private, but one idea I quite like is: all the node-specific methods are public, and all the building blocks are private.
> > - This would mean adding some more node-specific methods to cover the external callers of `resolveDependentMember`, but in each case I looked at this seems to make sense.
> >   (This neatly deals with the slightly grungy signature of resolveDependentMember by hiding it)
> > - it also mostly answers the question about contracts on non-dependent code: most non-dependent node types simply can't be passed in
> Thanks, you make some good points about this method being messy (and particularly that we're duplicating some "resolve the type of a non-dependent expression" logic, which has bothered me as well).
> 
> I like your proposed breakdown into smaller methods, and I partially implemented it.
> 
> A couple of notes:
> 
>  * I had to keep the [MemberExpr case](https://searchfox.org/llvm/rev/b3d7e761e347d562333893652dcf3837fa55d777/clang-tools-extra/clangd/FindTarget.cpp#218) around. It could not be subsumed by `Expr::getType()`, because for a `MemberExpr` that returns [this thing](https://searchfox.org/llvm/rev/b3d7e761e347d562333893652dcf3837fa55d777/clang/include/clang/AST/BuiltinTypes.def#283), while `getMemberDecl()->getType()` returns the function type of the member function, and we want the latter (for [this call site](https://searchfox.org/llvm/rev/b3d7e761e347d562333893652dcf3837fa55d777/clang-tools-extra/clangd/FindTarget.cpp#205)).
>  * We don't quite achieve "non-dependent node types simply can't be passed in", because `resolveCallExpr()` can be called with a non-dependent call expr. However, it currently has no external callers -- perhaps we should make it private?
> 
> I left `resolveDependentMember()` public (and left its external callers alone) for now. I can make it private and add additional public wrappers if you'd like, but I'm wondering what the motivation is (especially as you've described it as "a really nice example of a heuristic lookup with a clear contract" :-)).
Those notes/caveats seem totally fine to me.

> resolveCallExpr ... perhaps we should make it private?
>  left resolveDependentMember() public ... I can make it private... I'm wondering what the motivation is

I just think it's a little easier to understand how to use this class, and how to maintain/extend it, if the public methods form a consistent set with the same level of abstraction. Having all the public methods handle node types is tempting from this point of view.

I won't insist on this. We can inline the handling of UnusedUsingValueDecl into the caller as it happens to be simple, but I don't see a strong reason to do so.

================
Comment at: clang-tools-extra/clangd/HeuristicResolver.h:64
+  // specifier.
+  const Type *resolveNestedNameSpecifierToType(const NestedNameSpecifier *NNS);
+
----------------
nridge wrote:
> sammccall wrote:
> > The contract is a bit fuzzy here, and both the caller and callee have to deal with NNS recursion. It could be narrowed by requiring the resolved parent type to be passed in...
> I'm a bit unclear on what you mean by "both the caller and the callee have to deal with NNS recursion".
> 
> Looking through the callees, other than the recursive call in the implementation they all just make a single call to `resolveNestedNameSpecifiedToType()` and don't deal with NNS recursion?
> I'm a bit unclear on what you mean by "both the caller and the callee have to deal with NNS recursion".

You're right, I'm not sure what I was thinking here.

Maybe change the doc on this function to a "a dependent nested name specifier (i.e. Kind == Identifier)" and then adding an implementation comment "also handle TypeSpec case needed for recursion" or something...
I think that would make the contract clearer and more consistent with other functions.

================
Comment at: clang-tools-extra/clangd/unittests/FindTargetTests.cpp:90
+    for (const auto &Entry :
+         allTargetDecls(N->ASTNode, AST.getHeuristicResolver()))
       ActualDecls.emplace_back(Entry.first, Entry.second);
----------------
nridge wrote:
> sammccall wrote:
> > nridge wrote:
> > > sammccall wrote:
> > > > If we really want these to be separate, I think we should be using a null resolver except for the tests that are explicitly testing heuristics resolution. (And those tests could check both with/without resolver and verify it's required...)
> > > I'm not sure what you mean by "if we really want these to be separate". What are "these"?
> > I mean the separation between the "hard" AST logic in FindTarget and the heuristic resolution. And by "we" I really mean "I", I guess :-)
> > 
> > Ideally we'd test these both separately and smoke-test the interaction, but I think the low-hanging fruit is to use all the existing tests, with most of them testing FindTarget in isolation and the few exceptions testing FindTarget+HeuristicResolver together.
> My instict is to keep the codepath exercised by the tests as close as possible to the codepath exercised in production, because I've been bitten many times by "1. Looks like we've regressed scenario X. 2. But we have a test for scenario X. 3. Oh, but the test does something subtly different from what happens in production." So, if in production we always pass a non-null `HeuristicResolver`, I would prefer that we do so in the test as well.
> 
> Perhaps, as a compromise, we could annotate heuristic results as such (perhaps using a new `DeclRelation::Heuristic` flag?), and check for the expected presence / absence of that flag in the tests?
Adding a DeclRelation flag means these results will be excluded unless we explicitly include Heuristic in the filter.
(Maybe that's a good thing? Seems easy to forget though.)

Another way to solve this in the tests is to run them *all* with/without the heuristic resolver and mark the testcases where heuristic is required for them to pass.

I really would like the resolver to be nullable to enforce the layering, and to ensure the core functionality doesn't grow complex dependencies that would preclude its use in new places (even if only for prototyping). e.g. if targetDecl requires heuristic resolver, and that grows a dep on sema, then it might be hard to use in places where we don't have ParsedAST (background indexing maybe?)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D92290/new/

https://reviews.llvm.org/D92290