[PATCH] D121423: [VFS] OverlayFileSystem should consistently use the last-added FS first

Ben Barham via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Mar 11 15:17:28 PST 2022


bnbarham added a comment.

In D121423#3376253 <https://reviews.llvm.org/D121423#3376253>, @dexonsmith wrote:

> I think my explanation was a bit muddled.
>
> - Before this patch:
>   - `cd /c` would have returned "directory not found". Clients think we're in `/`.
>   - `cat a/x` will give content aligning with `/a/x`.
>   - This seems correct.
> - After this patch:
>   - `cd /c` has no error. Clients think we're in `/c`.
>   - `cat a/x` will give content aligning with `/a/x/`.
>   - This seems incorrect. Since the `cd /c` succeeded, we should get `/c/a/x`.

To be clear, `setCWD` never failed on InMemoryFileSystem (except with the current change where it fails on empty).

So before this patch:

- `cd /c` succeeds, both FS get that CWD
- `cat a/x` looks up `/c/a/x` in #1 which fails and then gives #2

After this patch:

- Same thing since setCWD never fails when given an actual path

But this *is* a problem. Consider the same thing but with RedirectingFileSystem instead:

  overlay
    #1 /a/x -> /real/foo
    #2 /c/a/x -> /real/bar
    #3 real, assume foo and bar exist but /a and /c do not

Before this patch `setCWD` would immediately fail. So I guess no weirdness but also... still not great since I clearly do have a `/c`?

After this patch, `setCWD` fails on #1 and #3 but succeeds on #2, so the overlay now has `/c`. We then lookup `a/x` which gives `/a/x` instead of `/c/a/x`, which is likely what is expected.

Neither particularly make any sense here. I think the best option is probably what you suggested - store a bit representing whether or not setCWD failed and if we get any relative paths in, immediately return with no_such_file_or_directory if that bit is set.

To alleviate the "getting stuck" maybe we could find the first `setCWD` that succeeds, `getCWD`, and then re-run `setCWD` on all the rest to that CWD. Thoughts?



================
Comment at: llvm/lib/Support/VirtualFileSystem.cpp:468
+      Success = true;
+      CWD = *FS->getCurrentWorkingDirectory();
+    }
----------------
dexonsmith wrote:
> I think this will be easier to reason about if `CWD` is only updated on the first success inside a given call to `OverlayFileSystem::setCurrentWorkingDirectory`. (Note that different VFSes could be canonicalizing relative paths differently.)
> 
> I also wonder if, instead, we can just keep track of which FS currently has the canonical CWD:
> ```
> lang=c++
> uint32_t IndexForCWD = 0; // Starts as BaseFS.
> ```
> Calls to `getCurrentWorkingDirectory()` would forward to `overlays_begin()[IndexForCWD].getCurrentWorkingDirectory()`. This way, the overlay inherits the CWD canonicalization of the first working FS (usually, `BaseFS`).
Updating multiple times was definitely unintentional. I did consider keeping track of the first working FS. I decided against it since theoretically you could change the CWD of one of the underlying FS if you had a reference to it. That is:
```
OverlayFS->setCurrentWorkingDirectory("/foo");
ContainedFS->setCurrentWorkingDirectory("/bar");
OverlayFS->getCurrentWorkingDirectory(); // Should this be /foo or /bar?
```

Also note that `overlays_begin()` is the reverse order, ie. `BaseFS` is the last FS used (not the first). I'd think it'd actually be fairly uncommon for it to be the first working FS.


================
Comment at: llvm/lib/Support/VirtualFileSystem.cpp:472-475
+  // Theoretically all the filesystems could have failed with an unrelated
+  // error, but this seems the most reasonable error to return.
+  if (!Success)
+    return errc::no_such_file_or_directory;
----------------
dexonsmith wrote:
> The out-of-sync-ness here will remain hard to reason about if there is a sequence of `cd relative-path` calls.
> 
> I think if `errc::no_such_file_or_directory` is NOT returned (if there was success on any of the filesystems) then
> - A bit should be stored in its entry in the OverlayFS, which prevents passing subsequent relative paths to it, until it has a valid CWD again.
> - Even `setCurrentWorkingDirectory()` would not be forwarded for a relative CWD unless the filesystem has a valid CWD. It'd be possible to get "permanently lost" after successive relative path moves, such as `cd a/` (not found in FS2) followed by `cd ..` (can't move back!), but I think much easier to reason about. (You could also potentially recover from that, by "trying" to apply `..`s in the path and asking a working FS whether the two CWDs are equivalent.)
> 
> (I'm a bit scared of changing this to return `true` in new places until something like that is in place.)
> 
> 
Yes, out-of-sync-ness is a real issue. I'll expand on that in my other comment.

Note that the currently failing tests are related to this somehow. They set CWD to "." and then try to find a relative file in the InMemoryFileSystem. That used to work but is now failing. I'm a little confused by that because the particular sequence seems the same in both:
  - setCWD to "." // Does nothing to InMemoryFileSystem, RealFileSystem will continue to have the original path
  - add "a.cc" to InMemoryFileSystem
  - try to read "a.cc", with my changes it asks for the full path (which fails), since it worked previously I can only assume that originally it passed it in as relative, so I need to find why that is


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D121423/new/

https://reviews.llvm.org/D121423



More information about the llvm-commits mailing list