[PATCH] D136651: [Clang] Give Clang the ability to use a shared stat cache

Frederic Riss via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Oct 24 20:13:05 PDT 2022


friss added a comment.

Thanks for the initial feedback!

> Mostly just skimmed so far, will hopefully have some time to look at this more tomorrow. Out of interest, do you have any performance numbers using this change? I assume this mainly impacts implicit modules (since I suspect we'd also be opening the file as well anyway), is that true?

You're correct that this overhead has been measured on implicit module builds. As I mentioned in the commit message this saves over 20% of the overall built time in some cases. This time is split between module validation (which could be skipped) and HeaderSearch (which cannot be skipped).



================
Comment at: clang/lib/Frontend/CompilerInvocation.cpp:5115-5130
+  for (const auto &File : CI.getHeaderSearchOpts().VFSStatCacheFiles) {
+    llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> Buffer =
+        llvm::MemoryBuffer::getFile(File);
+    if (!Buffer) {
+      Diags.Report(diag::err_missing_vfs_stat_cache_file) << File;
+      continue;
+    }
----------------
bnbarham wrote:
> IMO VFS overlays being modeled like this is a mistake and makes reasoning/testing about them fairly difficult. I have a PR up https://reviews.llvm.org/D121423 to change `OverlayFileSystem` to make more sense and be a real overlay rather than what it is now. If I finish that one off, how would you feel about changing the behavior of `StatCacheFileSystem` to just immediately error if it doesn't contain the file, rather than proxy (and thus chain) as it does now?
> 
> So for multiple files we'd then have:
>   - OverlayFileSystem
>     - StatCacheFileSystem
>     - StatCacheFileSystem
>     - RealFileSystem
> 
> Then any non-stat or exists call would return `llvm::errc::no_such_file_or_directory` and then the next FS would be used instead.
> 
> I don't think this *really* matters for `StatCacheFileSystem`, so I'm fine if you'd rather not wait for me to change `OverlayFileSystem`. I can make the changes myself after getting my PR in.
I don't think that's really doable if you want to keep the ability to cache negative hits. If a miss always results in a query to the real filesystem, then you're not saving the `stat` call. A lot of the time is spent in HeaderSearch which queries a similar number of non-existing and existing files.
But I'm not dead set on this. I also haven't spent a lot of time thinking about your proposal.


================
Comment at: llvm/lib/Support/VirtualFileSystem.cpp:2959-2960
+    // the pathis different. The canonicalization that the call to remove_dots()
+    // does leaves only '..' with symlinks as a source of confusion. If the path
+    // does not contain '..' we can safely say it doesn't exist.
+    if (std::find(sys::path::begin(SuffixPath), sys::path::end(SuffixPath),
----------------
bnbarham wrote:
> This sentence is a little confusing to me. `remove_dots` just leaves `..` unless you pass `remove_dot_dot` (but it doesn't do any checking). IMO just the `If the path does not contain '..' we can safely say it doesn't exist.` is enough.
`remove_dots` does more than remove dots. It is confusing, but it also removes excess separators (see the `Canonical` unit test). This means that the cache will work for /path/to/cache/file/a as well as /path/to/cache/file///a and /path/to/cache/file/././a. There are basically infinite spellings of a path just by adding `.` and additional separators.
`..` is interesting because it's not semantically preserving to remove them in the presence of symlinks.
I'm fine with simplifying the description, but that is the nuance I tried to convey.


================
Comment at: llvm/lib/Support/VirtualFileSystem.cpp:2961
+    // does not contain '..' we can safely say it doesn't exist.
+    if (std::find(sys::path::begin(SuffixPath), sys::path::end(SuffixPath),
+                  "..") == sys::path::end(SuffixPath)) {
----------------
bnbarham wrote:
> FWIW `StringRef` has a `contains`
But that wouldn't be correct. Here we are looking for a path component which is `..`. A simple text search would fire on a filename containing `..`. I think this search on components is the only correct way to do this.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136651/new/

https://reviews.llvm.org/D136651



More information about the llvm-commits mailing list