[PATCH] D98242: [clangd] Store system relative includes as verbatim

Sam McCall via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Mar 9 03:25:08 PST 2021


sammccall added inline comments.


================
Comment at: clang-tools-extra/clangd/index/SymbolCollector.cpp:801
+  if (IsSystem)
+    return "<" + ShorterInclude + ">";
   // Standard case: just insert the file itself.
----------------
This is a great heuristic, now I'm thinking of all the ways it can go wrong...

---

It could be slow: it's doing a string comparison for every include path entry (not just the -isystems) and that can be a long list for big codebases. We're calling this for most symbols we encounter, so in principle this looks quadratic in codebase size when indexing a preamble.

It's tempting to throw a cache on it but actually... the callsite looks like:
```
foreach header:
  foreach symbol:
    adjust(getIncludeHeader(symbol, header));
```

getIncludeHeader() is entirely dependent on the header almost ignores the symbol, so hoisting it out of the loop would be as good as caching it.

The wrinkle is the std::move hack, but we can move *that* part alone inside the loop instead.

---

It could fail to translate properly between paths, being inserted where the -isystem wasn't available. Often this means the *library* isn't available in this part of the codebase, so really the problem is offering the symbol at all (though using verbatim headers may make this harder to fix). Dynamic indexes spanning multiple projects are another case, but I think a marginal one. I don't think we need to address this.

---

It may give inconsistent results between TUs for the same symbol.
When a definition is indexed, it should win, so the problematic cases are:
 - header-only symbols (or no definition indexed) with inconsistent -isystem. This is not terribly important, as above I think.
 - definition is missing -isystem, because *within* the library headers are not considered "system". This seems likely to be common, and I'm not sure how to address it (other than giving verbatim headers priority always, which seems dubious). It's not a regression though (just the bug doesn't get fixed for these symbols.)
 - definition has -isystem (library impl considers its own headers as system) but other files don't. Also not sure how to solve this but it seems less likely to be common to me.

One design around these problems would be to store *both* a preferred spelling and a filename, and to use the spelling if it can be resolved. But this is invasive, means doing stat()s, doesn't work well with genfiles... I'm not really sure of a clean solution here.

Maybe we just move ahead as-is and see if we hit problems...


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98242/new/

https://reviews.llvm.org/D98242



More information about the cfe-commits mailing list