[PATCH] D97535: [clangd] Use URIs instead of paths in the index file list

Aleksandr Platonov via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Mar 2 02:36:55 PST 2021


ArcsinX added inline comments.


================
Comment at: clang-tools-extra/clangd/index/FileIndex.cpp:292
+    for (const auto &Sym : *Slab) {
+      if (Sym.Definition)
+        Files.insert(Sym.Definition.FileURI);
----------------
kadircet wrote:
> ArcsinX wrote:
> > kadircet wrote:
> > > it feels weird to choose one or the other here, why not just insert both (after checking if definition exists, ofc).
> > > 
> > > We are likely to have a merged symbol information anyway, and the logic here will likely result in no index owning the header files, unless there are some symbols *defined* (not just declared) in it.
> > > 
> > > This will likely result in some overshooting as knowing about a symbols declaration location doesn't mean indexing that particular file. But so does the current version, as knowing about definition location might be because of a merged symbol, rather than indexing of particular file.
> > I will try to explain on example:
> > 
> > - **test.h**
> > ```
> > void f();
> > ```
> > - **test.c**
> > ```
> > #include "test.h"
> > void f() { }
> > ```
> > - compile_commands.json contains a command for **test.c** compilation.
> > 
> > Scenario:
> > - open **test.c**
> > - try to find all references for `f()`
> > 
> > For that scenario result for `find all references` will be incorrect if both (decl and def) files are in the file list because:
> > - decl location is inside **test.h**
> > - def location is inside **test.c**
> > - the file list for the main file index contains **test.h** and **test.c**
> > - the main file index does not contain references from **test.h**
> > - the background (static) index contains references from **test.c**, but results from the background index will be skipped, because **test.h** is in the main file (dynamic) index file list.
> Ah i see. This all seems really fragile :/ We might as well have something like:
> 
> a.h:
> ```
> struct Bar;
> ```
> 
> a.cc:
> ```
> #include "a.h"
> struct Bar;
> ```
> 
> and now as soon as you open a.cc, all the results from a.h will be gone, because canonical declaration location for `Bar` will be in `a.h` and there is no definition. (and i believe this kind of forward-decl madness is quite common in at least LLVM). 
> Even worse, the same will also happen even if you have definition in the header, but a forward decl in the main file, so even accepting the file for definition wouldn't be enough.
> 
> It is currently (i.e. without this patch) working as expected, as main file index only owns the information for the "main file" indexing was initiated for. This feels like a big regression to me (that I didn't notice initially, sorry for that), but I am ready to be convinced otherwise :D
> 
> As Sam mentioned what you do here and partitioning logic in FileShardedIndex is quite similar (yours undershoot, sharding logic overshoots) but in the sharding process we split indexing result of a full TU/preamble, and later on those shards will always be used in a merged fashion (e.g. when a.cc and b.cc, both including a.h gets indexed, the shard produced for a.h from b.cc won't contain any definition locations, but the shard produced for a.cc will know about those symbols definition locations and in a merged view those symbols will have all the necessary information.), whereas in here the resulting information is used in isolation (main file isn't merged with preamble symbols, but mixes Files view). Hence causing such regressions.
> 
> I think the proper thing to do here is to propagate relevant files with slabs on `FileSymbols::update`. What you do here and sharding isn't very different. The question is should we have a:
> - string File
> - optional<string> File
> - vector<string> Files
> 
> Currently we always have exactly one File associated with all of those slabs as we either:
> - always do sharding (preamble and background idx)
> - even though symbol informations tell otherwise, we've only processed a single file (main file idx)
> 
> We would need 3rd option if we were to use filesymbols with a monolithic index, but we don't. And even if we need such a thing in future it shouldn't be a huge change hopefully.
> 
> 
> Sorry for the wall of text, I hope I do make sense, but please tell me if I misunderstood/missing something.
> It is currently (i.e. without this patch) working as expected, as main file index only owns the information for the "main file" indexing was initiated for. This feels like a big regression to me (that I didn't notice initially, sorry for that), but I am ready to be convinced otherwise :D

Currently it works mostly as expected. The only thing which worries me is the preamble index update: the first parameter passed to `PreambleSymbols.update()` is an URI, but we expect a file path there. Thus, the file list of the preamble index contains URIs, which seems incorrect and `PreambleIndex::indexedFiles()` always returns `IndexContents::None`.

In other words, we need to make the file list to contains always file paths or always URIs (current situation: the preamble index contains URIs, other indexes contain file paths). I thought that the best solution is to convert URI to file path before `PreambleSymbols.update()` call, but after D94952#inline-892421 discussion I was no sure about my solution, and seems the solution introduces in this patch also inacceptable.

Currently I am not sure what is the best way to solve this problem =(
 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97535/new/

https://reviews.llvm.org/D97535



More information about the cfe-commits mailing list