[PATCH] D94952: [clangd] Take into account what is in the index (symbols, references, etc.) at indexes merge

Aleksandr Platonov via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Jan 26 00:33:05 PST 2021


ArcsinX added inline comments.


================
Comment at: clang-tools-extra/clangd/index/FileIndex.cpp:433
     PreambleSymbols.update(
-        Uri, std::make_unique<SymbolSlab>(std::move(*IF->Symbols)),
+        FilePath ? *FilePath : (consumeError(FilePath.takeError()), Uri),
+        std::make_unique<SymbolSlab>(std::move(*IF->Symbols)),
----------------
ArcsinX wrote:
> sammccall wrote:
> > ArcsinX wrote:
> > > sammccall wrote:
> > > > Is this change related? It changes the key scheme for the preamble index from URIs to paths, but I'm not sure why.
> > > > 
> > > > Do we have multiple URIs pointing to the same path? What are they concretely?
> > > This is the main thing in this patch. I will try to explain.
> > > We use these keys to create the file list, which is used by `indexedFiles()`.
> > > Currently, the preamble index contains URI's instead of paths (as a file list), that leads to the function returned by `PreambleIndex::indexedFiles()` always return `false` (because we pass to this function paths, not URI's). So, we always take data from the preamble index (but maybe we should not in some cases).
> > > 
> > Oooh... I'm not sure how I misunderstood the original so much :-( And I missed it in this patch description as well, apologies.
> > 
> > My impression was that the file list was derived from the index data, rather than from the keys, which were always intended to be opaque/arbitrary.
> > (At various times, these have been filenames, URIs, and other things IIRC. And until relatively recently, the preamble index keys were the file the preamble was built from, not the file containing the symbol!)
> > 
> > It feels like using URIs extracted from symbols might not be *completely* robust. Because having CanonicalDeclaration etc set to a file might not line up exactly with the idea that we indexed the file. But we do use this partitioning for FileShardedIndex, so it has to work well enough.
> > 
> > The advantage of using the extracted URIs would be: also works for non-file-sharded indexes like --index-file, avoid a bunch of conversion between URI and path, and we get to keep the simpler/flexible design for FileSymbols where the key is opaque.
> > 
> > Does this seem feasible to you?
> > that leads to the function returned by `PreambleIndex::indexedFiles()` always return `false` (because we pass to this function paths, not URI's)
> 
> This is a bit incorrect.
> We pass to this function URI, but this URI is converted to path. i.e. `MemIndex::indexedFiles()`, `Dex::indexedFiles()` expect that `Files` are paths, but they are URI's for the preamble index. That's why `PreambleIndex::indexedFiles()` always return `false`.
I also do not like these path <=> URI conversions. But what about empty files?, e.g.:
- open a file
- remove everything from this file
- the dynamic index has no symbols with definition/declaration from this file, so we do not have this file in the dynamic index file list.
- the static index has symbols with definition/declaration from this file, so we have this file in the static index file list.
- we will show stale results from the static index for this file.


Unsure, maybe it's ok to ignore the problem with empty files, seems this is the only case when the file was indexed, but we have no symbols located there.

Overall, I like the idea to use URI's instead of paths. I think we could implement it first as a separate patch and after that return to this one.

What do you think?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D94952/new/

https://reviews.llvm.org/D94952



More information about the cfe-commits mailing list