[PATCH] D94952: [clangd] Take into account what is in the index (symbols, references, etc.) at indexes merge

Aleksandr Platonov via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Jan 27 09:36:28 PST 2021


ArcsinX marked 4 inline comments as done.
ArcsinX added inline comments.


================
Comment at: clang-tools-extra/clangd/index/FileIndex.cpp:433
     PreambleSymbols.update(
-        Uri, std::make_unique<SymbolSlab>(std::move(*IF->Symbols)),
+        FilePath ? *FilePath : (consumeError(FilePath.takeError()), Uri),
+        std::make_unique<SymbolSlab>(std::move(*IF->Symbols)),
----------------
sammccall wrote:
> ArcsinX wrote:
> > ArcsinX wrote:
> > > sammccall wrote:
> > > > ArcsinX wrote:
> > > > > sammccall wrote:
> > > > > > Is this change related? It changes the key scheme for the preamble index from URIs to paths, but I'm not sure why.
> > > > > > 
> > > > > > Do we have multiple URIs pointing to the same path? What are they concretely?
> > > > > This is the main thing in this patch. I will try to explain.
> > > > > We use these keys to create the file list, which is used by `indexedFiles()`.
> > > > > Currently, the preamble index contains URI's instead of paths (as a file list), that leads to the function returned by `PreambleIndex::indexedFiles()` always return `false` (because we pass to this function paths, not URI's). So, we always take data from the preamble index (but maybe we should not in some cases).
> > > > > 
> > > > Oooh... I'm not sure how I misunderstood the original so much :-( And I missed it in this patch description as well, apologies.
> > > > 
> > > > My impression was that the file list was derived from the index data, rather than from the keys, which were always intended to be opaque/arbitrary.
> > > > (At various times, these have been filenames, URIs, and other things IIRC. And until relatively recently, the preamble index keys were the file the preamble was built from, not the file containing the symbol!)
> > > > 
> > > > It feels like using URIs extracted from symbols might not be *completely* robust. Because having CanonicalDeclaration etc set to a file might not line up exactly with the idea that we indexed the file. But we do use this partitioning for FileShardedIndex, so it has to work well enough.
> > > > 
> > > > The advantage of using the extracted URIs would be: also works for non-file-sharded indexes like --index-file, avoid a bunch of conversion between URI and path, and we get to keep the simpler/flexible design for FileSymbols where the key is opaque.
> > > > 
> > > > Does this seem feasible to you?
> > > > that leads to the function returned by `PreambleIndex::indexedFiles()` always return `false` (because we pass to this function paths, not URI's)
> > > 
> > > This is a bit incorrect.
> > > We pass to this function URI, but this URI is converted to path. i.e. `MemIndex::indexedFiles()`, `Dex::indexedFiles()` expect that `Files` are paths, but they are URI's for the preamble index. That's why `PreambleIndex::indexedFiles()` always return `false`.
> > I also do not like these path <=> URI conversions. But what about empty files?, e.g.:
> > - open a file
> > - remove everything from this file
> > - the dynamic index has no symbols with definition/declaration from this file, so we do not have this file in the dynamic index file list.
> > - the static index has symbols with definition/declaration from this file, so we have this file in the static index file list.
> > - we will show stale results from the static index for this file.
> > 
> > 
> > Unsure, maybe it's ok to ignore the problem with empty files, seems this is the only case when the file was indexed, but we have no symbols located there.
> > 
> > Overall, I like the idea to use URI's instead of paths. I think we could implement it first as a separate patch and after that return to this one.
> > 
> > What do you think?
> Right, completely empty files are going to be mishandled., in the same way that before your changes we mishandled files that had no results from the query.
> 
> I do still think this is the way to go though, because:
>  - completely empty files are much rarer
>  - again this is a peer to an FileShardedIndex issue where empty files get no shards. Therefore *within* the dynamic index, we'll never clear out the symbols for a file while the file is emptied
>  - having SymbolCollector explicitly the files indexed is the principled solution to both problems, and that seems feasible for us to do at some point.
> 
> > Overall, I like the idea to use URI's instead of paths. I think we could implement it first as a separate patch and after that return to this one.
> 
> That sounds great if you don't mind!
> (Is there a reason we can't land the rest of this patch as-is already?)
> (Is there a reason we can't land the rest of this patch as-is already?)
Seems we have no reason =)


================
Comment at: clang-tools-extra/clangd/index/Index.h:98
+
+inline constexpr IndexDataKind operator|(IndexDataKind L, IndexDataKind R) {
+  return static_cast<IndexDataKind>(static_cast<uint8_t>(L) |
----------------
sammccall wrote:
> I'd also consider `explicit operator bool()` so you can write `if (Indexed(File) & IndexContents::References)`, but a question of taste.
Unsure how I can do this =(
I could not add `operator bool()` as a method for enum class.
I see these solutions:
- keep it as is
- implement `bool operator !(IndexContents)` and use `!!`
- make `IndexContents` to be struct/class
- add a function with a simple name instead of `operator bool`


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D94952/new/

https://reviews.llvm.org/D94952



More information about the cfe-commits mailing list