[PATCH] D92160: [clang] Fix wrong FDs are used for files with same name in Tooling

Duncan P. N. Exon Smith via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Dec 10 17:55:31 PST 2020


dexonsmith added subscribers: Bigcheese, jansvoboda11, arphaman.
dexonsmith added a comment.

In D92160#2444732 <https://reviews.llvm.org/D92160#2444732>, @OikawaKirie wrote:

> - In my solution, I use a pretty straightforward approach, which is to have an individual cache (for `FileEntry` and any other related things) for each working directory, and switch caches between working directories.

IIUC, this means the absolute paths won't be cached between changing working directories. Is that correct?

> - In your alternative solution, if I standerstand it correctly, `dropRelativePaths` is called in `notifyCurrentWorkingDirectoryChange`. All those `FileEntry` which are based on relative paths are dropped from the cache. I think this is a good idea, especially for keeping `FileManager` simple. However I'm not sure how many files are based on relative paths. If `dropRelativePaths` drops too many files and the working directory is switched back to a previous one (this could happen when analyzing with a `compile_commands.json`), it might result in many cache misses, thus involving more system calls to open files.

Yes, I think that's the tradeoff. My sense is that `FileManager` is usually/often fed absolute paths, but it likely depends on the build system and environment.

> There was an another solution I have ever tried, without things like `notifyCurrentWorkingDirectoryChange`. The solution is still straightforward, which is to use absolute path (instead of filename) to query/store a `FileEntry` from/to the cache. Actually I have tried this before but it ended up with lots of test cases failed. I don't know where I did wrong. If you think this approach is okay, I will continue working on this, and it might take some time.

I'm not surprised some tests failed with this but it might be worth looking deeper to understand current expectations. It might be valuable to look through Git history to find why we're not always making the paths absolute right now. We do it sometimes (see calls to `FixupRelativePath` and `makeAbsolutePath`) but not always. Why is that the case? What invariants do we need to keep functioning?

Another point is that `FileManager` takes the working directory at construction time via `FileSystemOptions`. There's a FIXME to remove that argument, but (as you've seen) the assumption is baked in that the CWD doesn't change. I think the current patch may not completely fix it, since you can't modify `FileSystemOptions`.

One idea I have (not fully fleshed out) is to drop FileSystemOpts as a constructor argument (instead grab / store the CWD from the VFS), and, inspired by your approach, split the data structures between relative and absolute paths. The existing data structures would only store absolute paths, but there are new ones for relative paths that can be cached and swapped out. Something like this:

  /// Current working directory. Grabbed from the (new) VFS whenever it's
  /// changed, and updated if the file manager is notified that the VFS's
  /// CWD changes.
  std::string CWD;
  
  struct RelativePaths {
    StringMap<FileEntryRef::MapValue> Files;
    StringMap<llvm::ErrorOr<DirectoryEntry &>> Dirs;
  };
  
  /// Relative entries for the current CWD.
  RelativePaths RelativeEntries;
  
  /// Cached relative entries for other CWDs.
  Optional<StringMap<RelativePaths>> CachedRelativeEntries;

The idea is that the existing data structures would now be restricted to absolute paths. Any query that uses a relative path checks in `RelativeEntries`; if nothing is there, the path is fixed up to an absolute path, inserted / looked up in the main data structures, and then an alias is added to `RelativeEntries`. If the CWD changes (atypical case, but possible), we do a similar cache dance to the one in your current patch.

@arphaman @Bigcheese, can you share whether we need to solve similar issues in clang-scan-deps? What approach is taken there? Do we just require absolute paths? (I know currently we don't reuse a FileManager, but my recollection is we had a prototype that mostly worked...)

Another thing to look at: is this a problem for implicit modules builds (the `CompilerInstance` that builds the module will share a `FileManager`, but use a different CWD)? Why or why not? (Again, maybe we just require absolute paths when doing implicit modules...)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D92160/new/

https://reviews.llvm.org/D92160



More information about the cfe-commits mailing list