[PATCH] D115232: [clangd] Indexing of standard library

Kadir Cetinkaya via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon May 16 08:42:36 PDT 2022


kadircet accepted this revision.
kadircet added a comment.
This revision is now accepted and ready to land.

sorry for the long turn around here, LGTM. let's ship it!



================
Comment at: clang-tools-extra/clangd/index/StdLib.cpp:233
+  // Sadly we can't use IndexOpts.FileFilter to restrict indexing scope.
+  // Files from outside the location may define true std symbols anyway.
+  // We end up "blessing" such headers, and can only do that by indexing
----------------
sammccall wrote:
> kadircet wrote:
> > what does `the location` refer to here? I think we should also stress the fact that even when indexing the same file, we have a good chance of seeing different symbols due to PP directives (and different std versions)
> > what does the location refer to here? 
> 
> It refers to the StdLibLocation Loc, made that explicit.
> 
> > I think we should also stress the fact that even when indexing the same file, we have a good chance of seeing different symbols due to PP directives (and different std versions)
> 
> Different than what? Do you mean "why might different calls to indexStandardLibrary see different symbols" from the same file?
> Different than what? Do you mean "why might different calls to indexStandardLibrary see different symbols" from the same file?
yes, i meant compared to a previous runs. but i don't think it's as relevant here. i believe i was thinking about caching indexing status across runs and using that cache to implement filefilter, so that we don't index the same file twice (as we normally do in bgindex).


================
Comment at: clang-tools-extra/clangd/index/StdLib.cpp:256
+  if (HadErrors) {
+    log("Errors when generating the standard library index, index may be "
+        "incomplete");
----------------
sammccall wrote:
> kadircet wrote:
> > i'd make this part of the next log
> Can you say why? I generally like one thought per line. Scanning vertically through familiar lines, it's easy to miss something unfamiliar tacked onto the end. This message should be rare, and log lines aren't precious.
> 
> (I reordered them, which seems a bit more natural)
i was rather implying to add it as a `(in)complete` field into the current log line you have. usually when clangd is printing lots of logs across threads it might be hard to correlate these. hence having them printed as a single log would help.


================
Comment at: clang-tools-extra/clangd/index/StdLib.cpp:313
+    llvm::StringRef DirPath = llvm::sys::path::parent_path(HeaderPath);
+    if (!HS.getFileMgr().getVirtualFileSystem().getRealPath(DirPath, Path))
+      SearchPaths.emplace_back(Path);
----------------
sammccall wrote:
> sammccall wrote:
> > kadircet wrote:
> > > why do we resolve the symlinks ?
> > Oops, because I read the documentation of getCanonicalPath() instead of the implementation, and they diverged in https://github.com/llvm/llvm-project/commit/dd67793c0c549638cd93cad1142d324b979ba682 :-D
> > 
> > Ultimately we're forming URIs to lexically compare with the ones emitted by the indexer, which uses getCanonicalPath(). Today getCanonicalPath() wants a FileEntry and we don't have one, but I think there's no fundamental reason for that, I can fix it.
> > 
> > (I'll do it as a separate patch, for now it's just calling getCanonicalPath with the wrong arguments)
> Actually, nevermind, the code is correct and I'd just forgotten why :-) Added a comment to StdLibLocation.
> 
> getCanonicalPath() does actually resolve symlinks and so on: it asks the FileManager for the directory entry and grabs the its "canonical name" which is just FS->getRealPath behind a cache.
> So the strings are going to match the indexer after all.
> 
> It'd be possible to modify getCanonicalPath() so we can call it here, but I don't think it helps. Calling it with (path, filemanager) is inconvenient for the (many) existing callsites, so it'd have to be a new overload just for this case. And the FileManager caching we'd gain doesn't matter here.
> I can still do it if you like, though.
> 
> (Also, relevant to your interests, realpath is probably taking care of case mismatches too!)
>So the strings are going to match the indexer after all.

thanks, this makes sense.

> It'd be possible to modify getCanonicalPath() so we can call it here, but I don't think it helps. Calling it with (path, filemanager) is inconvenient for the (many) existing callsites, so it'd have to be a new overload just for this case. And the FileManager caching we'd gain doesn't matter here.
> I can still do it if you like, though.

No need. We can take a look at that if the logic is likely to change (or get more complicated) in the future.


================
Comment at: clang-tools-extra/clangd/index/StdLib.h:66
+  // This function is threadsafe.
+  llvm::Optional<StdLibLocation> add(const LangOptions &, const HeaderSearch &);
+
----------------
sammccall wrote:
> kadircet wrote:
> > maybe drop the optinal and bail out in indexing when `Paths` are empty ?
> Why? This would definitely be using an empty vector as a sentinel value:
>  - 2 paths -> index
>  - 1 path -> index
>  - 0 paths -> don't index
> And it's not as if "probe for a standard library" is the main point of this function so the interpretation of the return value is obvious - that's only one of three criteria.
> 
> None seems to be a clearer way to communicate this than {}, and performance doesn't seem to be an issue here.
okay, makes sense.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115232/new/

https://reviews.llvm.org/D115232



More information about the cfe-commits mailing list