[PATCH] D115232: [clangd] Indexing of standard library

Sam McCall via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Mar 30 13:58:01 PDT 2022


sammccall added a comment.

(sorry about the long delay, would still love to merge this)



================
Comment at: clang-tools-extra/clangd/index/StdLib.cpp:92
+  // The umbrella header is the same for all versions of each language.
+  // Headers that are unsupported in old lang versions are usually guarded by
+  // #if. Some headers may be not present in old stdlib versions, the umbrella
----------------
kadircet wrote:
> maybe move second half of this comment into `buildUmbrella` ?
I think that removes the context of why we're #including them.


================
Comment at: clang-tools-extra/clangd/index/StdLib.cpp:160
+    if (!S.IncludeHeaders.empty() &&
+        StandardHeaders.contains(S.IncludeHeaders.front().IncludeHeader)) {
+      GoodHeader[S.CanonicalDeclaration.FileURI] = true;
----------------
kadircet wrote:
> `StandardHeaders` are always in verbatim format, but are we sure if that holds for the `IncludeHeader` ?
> 
> I suppose that should be the case since we map known path suffixes back to a verbatim umbrella header, I just wonder how exhaustive the list is. (it might be nice to manually check no implementation detail headers falls out of cracks)
Right, I looked at these manually and the headers (and symbols) we were dropping seemed reasonable.

Note that we're **not** filtering symbols in this loop, just building a list of blessed files to **add** to the StdLibLocation.

So we only drop symbols that:
 - aren't recognized by the indexer as being in the standard library
 - aren't in the standard library directory
 - don't share a file with anything recognized as being in the standard library



================
Comment at: clang-tools-extra/clangd/index/StdLib.cpp:177
+  }
+  for (const auto &Good : GoodHeader)
+    if (Good.second)
----------------
kadircet wrote:
> seems like debugging artifact, either drop or put in `#ifndef NDEBUG`
This is useful for debugging, and fits well with the other dlog()s in this file, I'd like to check it in.

I was going to call you paranoid, being sure this would get compiled out.
but indeed not: https://godbolt.org/z/hKWhro3Mv (gcc manages it)

Added #ifndef NDEBUG


================
Comment at: clang-tools-extra/clangd/index/StdLib.cpp:204
+      !CI->getPreprocessorOpts().ImplicitPCHInclude.empty()) {
+    elog("Indexing standard library failed: bad CompilerInvocation");
+    assert(false && "indexing stdlib with a dubious CompilerInvocation!");
----------------
kadircet wrote:
> why log if we think we can't hit this case?
Because I'm not certain, and would much rather get a user bug report with this log line in it than without!

(Assert because I'd most rather find out before release of course)


================
Comment at: clang-tools-extra/clangd/index/StdLib.cpp:216
+  IgnoreDiagnostics IgnoreDiags;
+  CI->getPreprocessorOpts().clearRemappedFiles();
+  auto Clang = prepareCompilerInstance(
----------------
kadircet wrote:
> why are we doing this exactly? once we override the same file multiple times, I believe the last one takes precedence. it's definitely safer to clear the remapped files here rather than relying on that fact, but I am afraid of losing other mappings, possibly set outside of clangd's knowledge (e.g. through flags like `-remap-file`?)
We map dirty buffers ourselves. Conceivably, it may be part of the standard library itself that was remapped to some dirty buffer content!

We're reusing the CompilerInvocation from building a preamble, where we remap the main-file buffer to the dirty contents. (We do this in prepareCompilerInstance, but the PPOptions are shared).
This isn't part of the "compilerinvocation-as-proxy-for-build-flags" that we're trying to index.

I don't think it's a realistic possibility that anyone would rely on `-Xclang -remap-file` to find the standard library (note that it's a cc1 flag, not a public one...).


================
Comment at: clang-tools-extra/clangd/index/StdLib.cpp:233
+  // Sadly we can't use IndexOpts.FileFilter to restrict indexing scope.
+  // Files from outside the location may define true std symbols anyway.
+  // We end up "blessing" such headers, and can only do that by indexing
----------------
kadircet wrote:
> what does `the location` refer to here? I think we should also stress the fact that even when indexing the same file, we have a good chance of seeing different symbols due to PP directives (and different std versions)
> what does the location refer to here? 

It refers to the StdLibLocation Loc, made that explicit.

> I think we should also stress the fact that even when indexing the same file, we have a good chance of seeing different symbols due to PP directives (and different std versions)

Different than what? Do you mean "why might different calls to indexStandardLibrary see different symbols" from the same file?


================
Comment at: clang-tools-extra/clangd/index/StdLib.cpp:256
+  if (HadErrors) {
+    log("Errors when generating the standard library index, index may be "
+        "incomplete");
----------------
kadircet wrote:
> i'd make this part of the next log
Can you say why? I generally like one thought per line. Scanning vertically through familiar lines, it's easy to miss something unfamiliar tacked onto the end. This message should be rare, and log lines aren't precious.

(I reordered them, which seems a bit more natural)


================
Comment at: clang-tools-extra/clangd/index/StdLib.cpp:313
+    llvm::StringRef DirPath = llvm::sys::path::parent_path(HeaderPath);
+    if (!HS.getFileMgr().getVirtualFileSystem().getRealPath(DirPath, Path))
+      SearchPaths.emplace_back(Path);
----------------
kadircet wrote:
> why do we resolve the symlinks ?
Oops, because I read the documentation of getCanonicalPath() instead of the implementation, and they diverged in https://github.com/llvm/llvm-project/commit/dd67793c0c549638cd93cad1142d324b979ba682 :-D

Ultimately we're forming URIs to lexically compare with the ones emitted by the indexer, which uses getCanonicalPath(). Today getCanonicalPath() wants a FileEntry and we don't have one, but I think there's no fundamental reason for that, I can fix it.

(I'll do it as a separate patch, for now it's just calling getCanonicalPath with the wrong arguments)


================
Comment at: clang-tools-extra/clangd/index/StdLib.cpp:323
+      llvm::vfs::Status Stat;
+      if (!HS.getFileMgr().getNoncachedStatValue(Path, Stat) &&
+          Stat.isRegularFile())
----------------
kadircet wrote:
> any reason for going with `Noncached` version? (clangd doesn't set one up today, but not relying on that would be nice if we don't have a particular concern here)
Because the cached version is more complicated with no benefits:
 - any entanglement between the IO of the preamble indexing and that of the translation unit that happened to trigger it seems like a complicated idea, that's worth understanding before doing
 - as you say, we don't actually install a statcache, so there's no concrete benefit
 - in fact there **exists no caching implementation** of FileSystemStatCache, so the idea that we might be able to implement that interface and gain benefits is *extremely* speculative



================
Comment at: clang-tools-extra/clangd/index/StdLib.h:66
+  // This function is threadsafe.
+  llvm::Optional<StdLibLocation> add(const LangOptions &, const HeaderSearch &);
+
----------------
kadircet wrote:
> maybe drop the optinal and bail out in indexing when `Paths` are empty ?
Why? This would definitely be using an empty vector as a sentinel value:
 - 2 paths -> index
 - 1 path -> index
 - 0 paths -> don't index
And it's not as if "probe for a standard library" is the main point of this function so the interpretation of the return value is obvious - that's only one of three criteria.

None seems to be a clearer way to communicate this than {}, and performance doesn't seem to be an issue here.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115232/new/

https://reviews.llvm.org/D115232



More information about the cfe-commits mailing list