[PATCH] D106064: [clang][deps] Normalize paths in minimizing file system

Duncan P. N. Exon Smith via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Jul 19 10:07:36 PDT 2021


dexonsmith added inline comments.


================
Comment at: clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp:161-162
+    const StringRef RawFilename) {
+  llvm::SmallString<256> Filename;
+  llvm::sys::path::native(RawFilename, Filename);
+
----------------
jansvoboda11 wrote:
> dexonsmith wrote:
> > I'm a bit nervous about the impact of modifying the input filename on Windows before passing it into other APIs. This could change behaviour of lower layers of the VFS (since they'll see a different filename than when DependencyScanningWOrkerFileSystem is NOT on top of them).
> > 
> > Can we restrict this just to what's passed to IgnoredFiles? (Maybe add `shouldIgnore()` API, which returns `false` if the set is empty, and then locally converts to native and checks for membership...)
> > 
> > It also seems wasteful to be calling `sys::path::native` and the memcpy all the time, when usually it has no effect. Have you checked whether this affects performance of scanning something big?
> Yeah, I can see that path changing between VFS layers can be problematic. I'm pretty sure we can get away with only converting `Filename` to its native form when interacting with `IgnoredFiles`.
> 
> I haven't checked the performance impact. If it ends up being measurable, I could implement something like `sys::path::is_native` and avoid the copy most of the time on unix-like OSes. WDYT?
Probably it'll end up not being measurable, but if it is, something like `is_native` might help... that said, if this will eventually be replaced with logic relyin on fs::UniqueID it might not be worth optimizing.


================
Comment at: clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp:171-172
 
   bool KeepOriginalSource = IgnoredFiles.count(Filename) ||
                             !shouldMinimize(Filename);
   DependencyScanningFilesystemSharedCache::SharedFileSystemEntry
----------------
jansvoboda11 wrote:
> dexonsmith wrote:
> > Looking at this, makes me wonder if this is just fixing a specific instance of a more general problem.
> > 
> > Maybe `IgnoredFiles` should be a set of `FileEntry`s instead of `StringRef`s... but that'd create a different performance bottleneck when the set is big, since creating the FileEntrys would be expensive. We'd want the FileEntry lookup to be globally cached / etc. -- and FileManager isn't quite safe to use globally.
> > 
> > Do you think IgnoredFiles as-is will work well enough for where it'll be used for PCH? Or do we need to catch headers referenced in two different ways somehow?
> I think we could use `llvm::sys::fs::UniqueID` instead of the filename to refer to files. Since the VFS layer resolves symlinks when stat-ing a file, that should be a canonical file identifier. I can tackle that in a follow up patch.
Yup, a unique ID should work for a file identifier.

I'm concerned about the cost of looking up the unique ID — avoiding stat traffic was measured to be an important performance benefit in the dependency scanner model.

To avoid a perf regression, I think you could use caches like:
- ids: filename -> unique-id
- originals: unique-id -> original file content
- minimized: unique-id -> minimized file content

Where "ids" and "originals" are read/cached in lock-step when accessing a filename, additionally computing "minimized" if not in the ignore-list. (Adding a file to the ignore-list would put content in "ids" and "originals".)

The goal is to amortize the `stat` cost across the lifetime of the service while ensuring a consistent view of the file content.

WDYT?

... regardless I think all of this is out of scope for the current patch, which is still useful for unblocking adding tests to the subsequent patches in the stack.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106064/new/

https://reviews.llvm.org/D106064



More information about the cfe-commits mailing list