[PATCH] D105083: [clangd] Ensure Ref::Container refers to an indexed symbol

Tue Jul 6 23:27:03 PDT 2021

nridge added inline comments.

================
Comment at: clang-tools-extra/clangd/index/SymbolCollector.cpp:489
     DeclRefs[ND].push_back(
-        SymbolRef{SM.getFileLoc(Loc), Roles, ASTNode.Parent});
+        SymbolRef{SM.getFileLoc(Loc), Roles, getRefContainer(ASTNode.Parent)});
   // Don't continue indexing if this is a mere reference.
----------------
kadircet wrote:
> nridge wrote:
> > kadircet wrote:
> > > as mentioned above I think we should start the traversal from `ASTNode.ParentDC`. any downsides to doing that ? I don't see a case where this container we return is **not** a `DeclContext`.
> > I assume you mean `ASTNode.ContainerDC`.
> > 
> > I tried this suggestion, but it looks like `ContainerDC` does not contain what we want in many cases.
> > 
> > For all of the following cases in `SymbolCollectorTest.RefContainers`:
> > 
> >  * ref2 (variable initializer)
> >  * ref3 (function parameter default value)
> >  * ref5 (template parameter default value)
> >  * ref6 (type of variable)
> >  * ref7a (return type of function)
> >  * ref7b (parameter type of function)
> > 
> > `ASTNode.ContainerDC` is the `TranslationUnitDecl` (i.e. the `DeclContext` in which the function or variable is declared) rather than the function or variable itself.
> > 
> > In addition, for ref4 (member initializer), `ContainerDC` is the `RecordDecl` rather than the `FieldDecl`.
> > 
> > So, I don't think this suggestion will work without making the `Ref::Container` field significantly less specific (and thereby the call hierarchy feature less useful in some cases).
> > 
> > I assume you mean ASTNode.ContainerDC.
> 
> yes, sorry for the mess.
> 
> > So, I don't think this suggestion will work without making the Ref::Container field significantly less specific (and thereby the call hierarchy feature less useful in some cases).
> 
> I can't seem to remember why we went with it in the first place. But now:
> ```
> int foo();
> int x = foo();
> void bar() {
>   int y = foo();
> }
> ```
> 
> calling call hierarchy on `foo` will yield `x` and `bar`, creating some sort of discrepancy. I am not really sure if telling first call to `foo` is contained in `x` is really useful compared to saying it was contained in `filename.cpp` (but ofc we can't do that today as container needs to be a symbol in the index, hence the ref will be lost instead). do you also think that would be more useful (i am not really a user of this feature so I would like to hear which is better from some users)?
> 
> also I can't say that I find the discrepancy between an initializer in parmvardecl/nontypetemplateparm vs vardecl useful (i know it already exists today, but not really sure why).
> they are actually attached to the first declcontext containing them, not the first decl (which you say is more useful, let's leave the fact that they are not indexed out for a while).
> moreover it is a behavior we rely on through libindex, as it just marks `ASTNode.Parent` with declcontext rather than making use of the vardecl. which makes me believe it is not really important to have the top-level symbol as the container. But it is the only way we can retain some container information. 
> if that's the case, I believe we should make this more explicit, i.e. start a traversal from the node itself, remember the last indexed decl we've seen as a parent and return that if we failed to find any indexed declcontext's containing the decl (with a fixme saying that we could have some file symbols in the index for translationunitdecls).
> That way we can say that the container is the first `DeclContext` containing the reference (with the exception of fixme mentioned).
> I am not really sure if telling first call to foo is contained in x is really useful compared to saying it was contained in filename.cpp [...] do you also think that would be more useful (i am not really a user of this feature so I would like to hear which is better from some users)?

In my opinion, in cases where a reference to a function occurs in the initializer (or otherwise in the declaration, such as perhaps in the type) of a namespace-scope variable, it is significantly more useful to show the name of that variable in the call hierarchy, than the name of the containing file.

Consider an example like this:

```
void FlushAndClearCaches();
...
// elsewhere, at namespace scope:
std::function<void()> OnMemoryPressureCallback = &FlushAndClearCaches;
```

If I invoke call hierarchy on `FlushAndClearCaches` (or one of its callees), and I'm looking at the children of the `FlushAndClearCaches` node in the call hierarchy tree, it is much more informative if the name of the relevant child item is `OnMemoryPressureCallback`, than if it is a file name. If it's a file name, I need to take the extra step of clicking on the item to navigate to the target file and the reference location inside it, and look there, whereas if it's a variable name I may have all the information I need just from the call hierarchy view.

I think being able to eyeball the call hierarchy view and get and idea of what sorts of functionality calls into a function, without having to click on each item and examine the context, is an important use case.

(It's fair to ask if this logic also applies to local variables. I would say no: local variables are more likely to be implementation details with un-interesting names. The name of the containing function is typically the more interesting part, and is still more granular than a file.)

I appreciate that on a technical level, this is an inconsistency between local variables vs. namespace-scope variables. However, from the point of view of a user of call hierarchy, I think this combination gives the best results.

I am also open to revising what we store in `Ref::Container` if additional consumers (other than call hierarchy) come up, but for now I think it makes sense to let it be guided by what it useful for call hierarchy. (I surveyed the Eclipse CDT codebase, where the corresponding piece of information in the index has one other consumer besides call hierarchy: it's used to annotate entries in the find-references view with the name of the calling function (https://github.com/clangd/clangd/issues/177). Were we to add this feature, I think we'd want to keep the same treatment of variables as for call hierarchy, for similar reasons.)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105083/new/

https://reviews.llvm.org/D105083