[PATCH] D59481: [clangd] Count number of references while merging RefSlabs inside FileIndex
Kadir Cetinkaya via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Thu Apr 25 09:27:17 PDT 2019
kadircet marked 2 inline comments as done.
kadircet added a comment.
It has been a long time since I've proposed that change and even I forgot some of the high level details. Therefore, I wanted to sum up the state again so that we can decide on how to move forward.
Currently we have two different on-disk formats for index in clangd:
- Static index:
- Merged single file, since merging is done before persistence, the data on-disk contains correct reference counts.
- Produced by running SymbolCollector on each TU. Collector doesn't keep state between different TUs, merging is done on the caller-site.
- No need to count references when loading, doesn't interact with `FileSymbols`.
- Background index:
- Sharded multiple files.
- Produced by running SymbolCollector on each TU, but instead of merging we separate those symbols into distinct files. Merging is done later on within `FileSymbols` which has no information about TUs.
FileSymbols is currently being used by BackgroundIndex and FileIndex(Dynamic Idx):
- In the case of BackgroundIndex, a symbol occurs at most twice(once in the header declaring it and once in the implementation file, if they are distinct), therefore when merging is done symbol references doesn't go above that.
- FileIndex doesn't perform de-duplication before-hand therefore it can perform reference counting while performing the merge. But currently it doesn't perform any merging.
As a result, changing `FileSymbols` would only effect `BackgroundIndex` and nothing else. Since merging occurs using the information in `RefSlabs` this can also be extended to benefit `FileIndex` and will most definitely be used
by any sort of indexing that performs merging over multiple files.
For unification purposes I would rather delete the `References` counting logic in `SymbolCollector` and perform it only at mergers(clangd-indexer and FileSymbols)
Comment at: clang-tools-extra/clangd/index/Background.h:113
+ // Note that FileSymbols counts References by incrementing it per each file
+ // mentioning the symbol, including headers. This contradicts with the
> We should agree on and stick to a single behavior in both cases.
> I suggest keeping the current behavior for now (translation units). Why can't we implement that?
Because `FileSymbols` only knows about file names and has no idea about whether a given file is TU or not. We might add some heuristics on the file extensions, or change SymbolCollector to count number of files(this would require maintaining a cache to de-duplicate references coming from the same files but originating from a different TU).
Comment at: clang-tools-extra/clangd/index/FileIndex.cpp:52
+ // If this is the first time we see references to a symbol, reset its
+ // Reference count, since filesymbols re-count number of references
> The comment suggests there's something cheesy going on.
> Why would FileSymbols recompute the number of references? Can we avoid this complicated logic?
This was the idea behind https://github.com/clangd/clangd/issues/23. Please see the main comment for a higher level discussion.
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
More information about the cfe-commits