[PATCH] D121423: [VFS] OverlayFileSystem should consistently use the last-added FS first

Duncan P. N. Exon Smith via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Mar 11 14:38:54 PST 2022


dexonsmith added a comment.

This seems like a great thing to do. I have some comments

In D121423#3375818 <https://reviews.llvm.org/D121423#3375818>, @bnbarham wrote:

> `clang-tidy/infrastructure/empty-database.cpp` is failing because `setCurrentWorkingDirectory("")` previously returned an error and no longer does.
>
> It would have run on the RealFileSystem with a *process linked* CWD previously, which really just runs a `chdir`. chdir errors if the path doesn't exist. It now successfully sets on the InMemoryFileSystem since it checks nothing.

Is this behaviour on an empty string consistent across platforms?

> This raises a few couple interesting questions:
>
> 1. The implementation of `setCurrentWorkingDirectory` is different for the non-process linked case. It first makes the path absolute and then errors if that path doesn't exist. So if that was being used, we also wouldn't error here (it would effectively be a no-op).

I feel like there are a few reasonable options.

- Special case empty string in the VFS as meaning the same thing as `.`.
- Special case empty string in the VFS as always given an error.
- Try to match behaviour of the underlying OS (if they're all the same, that's the same as one of the above).

IMO, `cd ""` is not a very interesting case, but it'd be nice for it to be consistent.

> 2. Should InMemoryFileSystem fail to set the CWD if the directory doesn't exist? I was considering doing this before I noticed this test failing, since it seems inconsistent with the other implementation. Thoughts?

My guess is that it doesn't error partly to align with the behaviour in OverlayFileSystem, so updating it as you've done seems fine to me, assuming we get the CWD working correctly in OverlayFS.

However, I'm concerned that the combination of the changes here will make the following do the wrong thing:

  overlay:
    in-memory #1: (base)
      /a/x
    in-memory #2:
      /c/a/x

operations:

  cd /
  cd /c
  cat a/x

IIUC the combined effect of your changes, this "should" print `/c/a/x` from in-memory-v2, but instead it'll print `/a/x` from in-memory-v1. And I think before this patch, it would have done the right thing.



================
Comment at: llvm/lib/Support/VirtualFileSystem.cpp:468
+      Success = true;
+      CWD = *FS->getCurrentWorkingDirectory();
+    }
----------------
I think this will be easier to reason about if `CWD` is only updated on the first success inside a given call to `OverlayFileSystem::setCurrentWorkingDirectory`. (Note that different VFSes could be canonicalizing relative paths differently.)

I also wonder if, instead, we can just keep track of which FS currently has the canonical CWD:
```
lang=c++
uint32_t IndexForCWD = 0; // Starts as BaseFS.
```
Calls to `getCurrentWorkingDirectory()` would forward to `overlays_begin()[IndexForCWD].getCurrentWorkingDirectory()`. This way, the overlay inherits the CWD canonicalization of the first working FS (usually, `BaseFS`).


================
Comment at: llvm/lib/Support/VirtualFileSystem.cpp:472-475
+  // Theoretically all the filesystems could have failed with an unrelated
+  // error, but this seems the most reasonable error to return.
+  if (!Success)
+    return errc::no_such_file_or_directory;
----------------
The out-of-sync-ness here will remain hard to reason about if there is a sequence of `cd relative-path` calls.

I think if `errc::no_such_file_or_directory` is NOT returned (if there was success on any of the filesystems) then
- A bit should be stored in its entry in the OverlayFS, which prevents passing subsequent relative paths to it, until it has a valid CWD again.
- Even `setCurrentWorkingDirectory()` would not be forwarded for a relative CWD unless the filesystem has a valid CWD. It'd be possible to get "permanently lost" after successive relative path moves, such as `cd a/` (not found in FS2) followed by `cd ..` (can't move back!), but I think much easier to reason about. (You could also potentially recover from that, by "trying" to apply `..`s in the path and asking a working FS whether the two CWDs are equivalent.)

(I'm a bit scared of changing this to return `true` in new places until something like that is in place.)




Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D121423/new/

https://reviews.llvm.org/D121423



More information about the llvm-commits mailing list