[PATCH] D109128: [VFS] Use original path when falling back to external FS

Duncan P. N. Exon Smith via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Sep 14 09:14:41 PDT 2021


dexonsmith added a comment.



In D109128#2997588 <https://reviews.llvm.org/D109128#2997588>, @JDevlieghere wrote:

> Keith and I discussed this offline. My suggestion was to do the following:
>
> 1. Check the overlay for the canonicalized path
> 2. Check the fall-through for the canonicalized path
> 3. Check the fall-through for the original path

I'm not sure it's correct to do (3) at all...

In D109128#2997787 <https://reviews.llvm.org/D109128#2997787>, @keith wrote:

> In D109128#2997588 <https://reviews.llvm.org/D109128#2997588>, @JDevlieghere wrote:
>
>> If I understand correctly, this patch does that, but swaps 2 and 3. What's the motivation for trying the non-canonical path first? If we treat the current working directory as a property directory (which is what the canonicalization is all about), then it seems like we should exhaust those options first.
>
> Using the canonical path first is the root of the problem, using that first the behavior to return the absolute path virtually always, for example in my new test these assertions

I think the problem is that we've been conflating two separate things:

- The path used internally to find the filesystem object.
- The path reported back to the user.

I assert that the path used internally to find the filesystem object should always/only be the canonical path.

1. Check the overlay for the canonicalized path.
2. Check the fall-through for the canonicalized path.

But the path reported back to the user should almost always be the original path used for lookup. ONLY when the path was found in the overlay with "use-external-names" should it be modified at all.

I think this could be implemented by with the help of adding the following APIs to `vfs::File`:

  class File {
  public:
    // Get the same file wrapped with a different reported path.
    static std::unique_ptr<File> getWithPath(std::unique_ptr<File> F,
                                             const Twine &P);
  
  protected:
    // API for mutating the path. Override to avoid needing a wrapper
    // object for File::getWithPath.
    virtual bool setPath(const Twine &Path) { return false; }
  };

then getWithPath can be:

  namespace {
  class WrapperFile : public File { /* Change the observed path */ };
  }
  
  std::unique_ptr<File> File::getWithPath(std::unique_ptr<File> F, const Twine &P) {
    if (F->setPath(P))
      return F;
    return std::make_unique<WrapperFile>(F, P);
  }

Most the in-tree `vfs::File` derived classes can implement `setPath` so there shouldn't be many extra heap objects in practice.

@JDevlieghere / @keith, WDYT?



================
Comment at: llvm/lib/Support/VirtualFileSystem.cpp:2025-2028
 ErrorOr<std::unique_ptr<File>>
-RedirectingFileSystem::openFileForRead(const Twine &Path_) {
-  SmallString<256> Path;
-  Path_.toVector(Path);
+RedirectingFileSystem::openFileForRead(const Twine &Path) {
+  SmallString<256> CanonicalPath;
+  Path.toVector(CanonicalPath);
----------------
With the new comment, I think it'd be useful to have SmallString versions of path the original and canonical paths. In which case:
```
lang=c++
SmallString<256> Path;
Path_.toVector(Path);
SmallString<256> CanonicalPath = Path;
```
(with the original `Path_` for the parameter) seems cleanest?


================
Comment at: llvm/lib/Support/VirtualFileSystem.cpp:2036-2040
+      auto Result = ExternalFS->openFileForRead(Path);
+      if (!Result.getError())
+        return Result;
+
+      return ExternalFS->openFileForRead(CanonicalPath);
----------------
I think the correct logic here is something like:
```
lang=c++
if (auto Result = ExternalFS->openFileForRead(CanonicalPath))
  return Result->getPath() == CanonicalPath && Path != CanonicalPath
      ? File::getWithPath(*Result, Path)
      : *Result;
else
  return Result.getError();
```

- Ensures soundness in terms of getting the right file by always using the canonical path.
- If the fallthrough filesystem changes the name (e.g., if it's another redirecting filesystem with use-external-names) then passes that through.
- If the fallthrough filesystem reports back the name it was given, then passes back the name this function was given (matches `stat` behaviour) -- by induction, if all underlying filesystems are following a similar rule, then only if use-external-names=true was hit would the path be changed at all.


================
Comment at: llvm/lib/Support/VirtualFileSystem.cpp:2053-2059
+    if (shouldFallBackToExternalFS(ExternalFile.getError(), Result->E)) {
+      auto Result = ExternalFS->openFileForRead(Path);
+      if (!Result.getError())
+        return Result;
+
+      return ExternalFS->openFileForRead(CanonicalPath);
+    }
----------------
Can this logic be shared, with the above, maybe put in a lambda?


================
Comment at: llvm/lib/Support/VirtualFileSystem.cpp:2069
   Status S = getRedirectedFileStatus(
-      Path, RE->useExternalName(UseExternalNames), *ExternalStatus);
+      CanonicalPath, RE->useExternalName(UseExternalNames), *ExternalStatus);
   return std::unique_ptr<File>(
----------------
I think `Path` should be sent in here, not `CanonicalPath`. Only if use-external-names is set should the observed path be canonicalized here... and in that case, the argument will be ignored anyway.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109128/new/

https://reviews.llvm.org/D109128



More information about the cfe-commits mailing list