[PATCH] D98242: [clangd] Store system relative includes as verbatim

Kadir Cetinkaya via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Mar 9 12:20:24 PST 2021


kadircet added inline comments.


================
Comment at: clang-tools-extra/clangd/index/SymbolCollector.cpp:801
+  if (IsSystem)
+    return "<" + ShorterInclude + ">";
   // Standard case: just insert the file itself.
----------------
sammccall wrote:
> This is a great heuristic, now I'm thinking of all the ways it can go wrong...
> 
> ---
> 
> It could be slow: it's doing a string comparison for every include path entry (not just the -isystems) and that can be a long list for big codebases. We're calling this for most symbols we encounter, so in principle this looks quadratic in codebase size when indexing a preamble.
> 
> It's tempting to throw a cache on it but actually... the callsite looks like:
> ```
> foreach header:
>   foreach symbol:
>     adjust(getIncludeHeader(symbol, header));
> ```
> 
> getIncludeHeader() is entirely dependent on the header almost ignores the symbol, so hoisting it out of the loop would be as good as caching it.
> 
> The wrinkle is the std::move hack, but we can move *that* part alone inside the loop instead.
> 
> ---
> 
> It could fail to translate properly between paths, being inserted where the -isystem wasn't available. Often this means the *library* isn't available in this part of the codebase, so really the problem is offering the symbol at all (though using verbatim headers may make this harder to fix). Dynamic indexes spanning multiple projects are another case, but I think a marginal one. I don't think we need to address this.
> 
> ---
> 
> It may give inconsistent results between TUs for the same symbol.
> When a definition is indexed, it should win, so the problematic cases are:
>  - header-only symbols (or no definition indexed) with inconsistent -isystem. This is not terribly important, as above I think.
>  - definition is missing -isystem, because *within* the library headers are not considered "system". This seems likely to be common, and I'm not sure how to address it (other than giving verbatim headers priority always, which seems dubious). It's not a regression though (just the bug doesn't get fixed for these symbols.)
>  - definition has -isystem (library impl considers its own headers as system) but other files don't. Also not sure how to solve this but it seems less likely to be common to me.
> 
> One design around these problems would be to store *both* a preferred spelling and a filename, and to use the spelling if it can be resolved. But this is invasive, means doing stat()s, doesn't work well with genfiles... I'm not really sure of a clean solution here.
> 
> Maybe we just move ahead as-is and see if we hit problems...
for latency concerns, i was mostly turning a blind eye as this would only happen once per indexing. but you are right, we can get away from any possible regressions by moving it to per-header layer.

---

> It could fail to translate properly between paths, being inserted where the -isystem wasn't available. Often this means the *library* isn't available in this part of the codebase, so really the problem is offering the symbol at all (though using verbatim headers may make this harder to fix). Dynamic indexes spanning multiple projects are another case, but I think a marginal one. I don't think we need to address this.

as you mentioned this is likely to show up either when there's a remote-index/static-index built on another machine in use or dynamic index with multiple projects.
- for remote-index/static-index case, i think it is unlikely to show up, as the symbol should be available during local development too.
-for dynamic index spanning multiple indexes though, previously we would just suggest a wrong symbol and now we'll also perform a wrong header insertion so it is sort-of worse. but as you mentioned this should probably be handled on symbol level as well.

my mental model around the fix was rather making use of the declaration/definition location of the symbol, which is guaranteed to be a URI, rather than a symbolic representation of the "owning" header. hence i feel like this shouldn't make things harder when we are fixing that issue.

> It may give inconsistent results between TUs for the same symbol.

I didn't consider this case thoroughly, as I was assuming system search paths to be same for the whole codebase (leaving multiple-projects case out).

> header-only symbols (or no definition indexed) with inconsistent -isystem. This is not terribly important, as above I think.

well, in theory we just accumulate include headers in this case, which means we can still recover somehow if it turns out to be important.

> definition is missing -isystem, because *within* the library headers are not considered "system". This seems likely to be common, and I'm not sure how to address it (other than giving verbatim headers priority always, which seems dubious). It's not a regression though (just the bug doesn't get fixed for these symbols.)

ugh that's true :/ i am not really sure what's the convention for these in libraries themselves, but even with clangd if we somehow end up indexing definition locations for those symbols and user triggers a go-to, clangd will likely use fallback commands to index those files, which will be missing `-isystem` like flags. but as you mentioned this just means some cases are left unfixed.

as for solution, the best i can think of is we keep storing URIs for include headers, in all cases and never store verbatim. then we figure out the spelled include at use time, while making use of headersearchinfo and URI specific "get the spelling" logic (we probably want to have some other mechanism for the latter).
but it means we perform the resolution on every code completion item (which we already do, when the stored include isn't verbatim and uri spelling doesn't yield anything), so this shouldn't terribly regress completion latency.

basically what i am suggesting is propagating headersearchinfo into `URI::getIncludeSpelling`, which is more invasive then this change and conceptually doesn't sound so appealing. here's another terrible idea :), maybe we should have a module hook instead for generating custom include spellings ? (but we don't want to iterate over all modules on every completion i suppose, so this will definitely require some extra tricks like fetching a functor from the module, also there's the question of multiple modules implementing the hook..)

so yeah this is definitely more invasive, hence i'd rather go with the solution proposed here until we see some complaints around this specific workflow, wdyt?

> definition has -isystem (library impl considers its own headers as system) but other files don't. Also not sure how to solve this but it seems less likely to be common to me.

i think the approach mentioned above should help with this too.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98242/new/

https://reviews.llvm.org/D98242



More information about the cfe-commits mailing list