[PATCH] D106064: [clang][deps] Normalize paths in minimizing file system

Jan Svoboda via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Jul 19 06:55:45 PDT 2021


jansvoboda11 added inline comments.


================
Comment at: clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp:161-162
+    const StringRef RawFilename) {
+  llvm::SmallString<256> Filename;
+  llvm::sys::path::native(RawFilename, Filename);
+
----------------
dexonsmith wrote:
> I'm a bit nervous about the impact of modifying the input filename on Windows before passing it into other APIs. This could change behaviour of lower layers of the VFS (since they'll see a different filename than when DependencyScanningWOrkerFileSystem is NOT on top of them).
> 
> Can we restrict this just to what's passed to IgnoredFiles? (Maybe add `shouldIgnore()` API, which returns `false` if the set is empty, and then locally converts to native and checks for membership...)
> 
> It also seems wasteful to be calling `sys::path::native` and the memcpy all the time, when usually it has no effect. Have you checked whether this affects performance of scanning something big?
Yeah, I can see that path changing between VFS layers can be problematic. I'm pretty sure we can get away with only converting `Filename` to its native form when interacting with `IgnoredFiles`.

I haven't checked the performance impact. If it ends up being measurable, I could implement something like `sys::path::is_native` and avoid the copy most of the time on unix-like OSes. WDYT?


================
Comment at: clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp:171-172
 
   bool KeepOriginalSource = IgnoredFiles.count(Filename) ||
                             !shouldMinimize(Filename);
   DependencyScanningFilesystemSharedCache::SharedFileSystemEntry
----------------
dexonsmith wrote:
> Looking at this, makes me wonder if this is just fixing a specific instance of a more general problem.
> 
> Maybe `IgnoredFiles` should be a set of `FileEntry`s instead of `StringRef`s... but that'd create a different performance bottleneck when the set is big, since creating the FileEntrys would be expensive. We'd want the FileEntry lookup to be globally cached / etc. -- and FileManager isn't quite safe to use globally.
> 
> Do you think IgnoredFiles as-is will work well enough for where it'll be used for PCH? Or do we need to catch headers referenced in two different ways somehow?
I think we could use `llvm::sys::fs::UniqueID` instead of the filename to refer to files. Since the VFS layer resolves symlinks when stat-ing a file, that should be a canonical file identifier. I can tackle that in a follow up patch.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106064/new/

https://reviews.llvm.org/D106064



More information about the cfe-commits mailing list