[PATCH] D50385: [clangd] Collect symbol occurrences from AST.

Thu Aug 16 06:29:25 PDT 2018

hokein added a comment.

> Hmm, I think this is the same for other symbol payload e.g. definition can be missing for a symbol. And it seems to me that the concern is on the SymbolSlab level: if a slab is for a single TU, users should expect missing information; if a slab is merged from all TUs, then users can expect "complete" information. I think it's reasonable to assume that users of SymbolSlab are aware of this. I think it's probably not worth the overhead of maintaining and using two separate slabs.

My concerns of storing occurrences as an extra payload of `Symbol` are:

- `SymbolSlab` is more like an implementation detail.  Users of `SymbolIndex` are not aware of it, they only get `Symbol` objects, so it easily confuses users if they see any occurrence-related interface/member in `Symbol`. And we will write a looong comment explaining its correct behavior. It'd be better if we avoid this confusion in the API level.
- The fields in `Symbol` structure are symbol properties, and could be stored in memory. However, occurrences are not, we can't guarantee that.
- It seems that we are coupling `ID`, `Symbol`, `SymbolOccurrence` together: in the index implementation, we will go through ID=>Symbol=>Occurrences rather than ID=>Occurrences.

> I think these options are reasonable if they turn out to be necessary.

I think they are necessary. For collecting all occurrences for local symbols from the AST, we only need symbol occurrence information, other information (e.g. declaration&definition location, #include) should be discarded; Index for code completion should not collect symbol occurrences.

> And making the SymbolCollector more complicated doesn't seem to be a problem if we are indeed doing more complicated work, but I don't think this would turn into a big problem as logic of xrefs seems pretty isolated.

If xrefs is quite isolated, I think it is a good signal to have a dedicated class handling it.

> I think implementing xrefs in a separate class would likely to cause more duplicate and maintenance, e.g. two sets of options, two sets of initializations or life-time tracking of collectors (they look a lot alike), the same boilerplate factory code in tests, passing around two collectors in user code.

Merging xrefs to `SymbolCollector` couldn't avoid these problems, I think it is a matter of where we put these code:

- different initialization of `SymbolCollector` for different use cases (e.g. setting different flags in SymbolCollectorOptions).
- for dynamic index, index for xrefs and code completion would be triggered at different point: index for xrefs should happen when AST is ready; index for code completion happens when Preamble is ready;  we might end up with two slabs instances in the dynamic index (1 symbol slab + 1 occurrence slab vs. 2 symbol slabs).

The duplication is mainly about AST frontend action boilerplate code. To eliminate it, we could do some refactorings:

- get rid of the clang ast action code in SymbolCollector, and SymbolOccurrenceCollector
- introduce an `IndexSymbol` which is a subclass `index::IndexDataConsumer`
- the `IndexSymbol` has two mode (indexing symbol or indexing occurrence),  and dispatch ast information to `SymbolCollector`/`SymbolOccurrenceCollector`.

Repository:
  rCTE Clang Tools Extra

https://reviews.llvm.org/D50385