[PATCH] D50385: [clangd] Collect symbol occurrences from AST.

Thu Aug 16 06:18:37 PDT 2018

ilya-biryukov added a comment.

In https://reviews.llvm.org/D50385#1193600, @ioeric wrote:

> In https://reviews.llvm.org/D50385#1193545, @hokein wrote:
>
> > In https://reviews.llvm.org/D50385#1191914, @ioeric wrote:
> >
> > > 2 high-level questions:
> > >
> > > 1. What's the reason for having a separate `SymbolOccurrenceSlab`? Could store occurrences as extra payload of `Symbol`?
> >
> >
> > Storing occurrences in `Symbol` structure is easy to misuse by users IMO -- if we go through this way, we will end up having a `getOccurrences`-like method in `Symbol` structure. Once users get the `Symbol` instance, it is natural for them to call `getOccurrences` to get all occurrences of the symbol. However this `getOccurrences` method doesn't do what users expected (just returning an incomplete set of results or empty). To query the symbol occurrences, we should always use index interface.
> >
> > Therefore, I think we should try to avoid these confusions in the design.
>
>
> Hmm, I think this is the same for other symbol payload e.g. definition can be missing for a symbol. And it seems to me that the concern is on the SymbolSlab level: if a slab is for a single TU, users should expect missing information; if a slab is merged from all TUs, then users can expect "complete" information. I think it's reasonable to assume that users of SymbolSlab are aware of this. I think it's probably not worth the overhead of maintaining and using two separate slabs.

I think it's reasonable to keep occurrences away from Symbol's Detail field. Stashing them together is only fine for the collector API, having any way to directly access occurrences through `Symbol` will be totally confusing for all the other users.
E.g., the `Index::lookup()` will not provide occurrences in the `Symbol` instances it returns, and if the accessors for those will be there it will only add confusion. So +1 to keeping them out of the `Symbol` class.

On the other hand, `SymbolSlab` feels like a perfectly reasonable place to store the occurrences in addition to the symbols themselves and it feels we should reuse its memory arena for storing any strings we need to allocate, etc.

>>> 2. Could we merge `SymbolOccurrenceCollector` into the existing `SymbolCollector`? They look a lot alike. Having another index data consumer seems like more overhead on the user side.
>> 
>> The `SymbolOccurrenceCollector` has many responsibilities (collecting declaration, definition, code completion information etc), and the code is growing complex now. Merging the `SymbolOccurrenceCollector` to it will make it more
> 
> Although the existing `SymbolCollector` supports different options, I think it still has a pretty well-defined responsibility: gather information about symbols. IMO, cross-reference is one of the property of symbol, and I don't see strong reasons to keep them separated.
> 
>> complicated -- we will introduce more option flags like `collect-symbol-only`, `collect-occurrence-only` to configure it for our different use cases (we need to the implementation detail clearly in order to make a correct option for `SymbolCollector`).
> 
> I think these options are reasonable if they turn out to be necessary. And making the SymbolCollector more complicated doesn't seem to be a problem if we are indeed doing more complicated work, but I don't think this would turn into a big problem as logic of xrefs seems pretty isolated.  Conversely, I think implementing xrefs in a separate class would likely to cause more duplicate and maintenance, e.g. two sets of options, two sets of initializations or life-time tracking of collectors (they look a lot alike), the same boilerplate factory code in tests, passing around two collectors in user code.
> 
>> And I can foresee these two collectors might be run at different point (runWithPreamble vs runWithAST) in dynamic index.
> 
> With some options, this should be a problem I think?

+1 to merging into the `SymbolCollector`. Keeping the responsibilities separate inside a single class should be easy, e.g. something like that should be simple enough:

  SymbolCollector::handleDeclOccurence(args) {
    this->processForSymbol(args); // handles keeping the Symbol structure up-to-date, i.e. adds definition locations, etc.
    this->processForOccurrences(args); // appends occurrences to a list of xrefs.
  };

The main advantage that we get is less clang-specific boilerplate. The less `IndexDataConsumer`s, `FrontendActionFactory`s, `FrontendAction`s we create, the more focused and concise our code is.
And in that case, `SymbolCollector` is already handling those responsibilities for us and reusing looks like a good idea.

Repository:
  rCTE Clang Tools Extra

https://reviews.llvm.org/D50385