[PATCH] Add a VFSFromYAML class and a parser to create it

Ben Langmuir blangmuir at apple.com
Thu Feb 20 10:02:55 PST 2014


  > I think it is better to move the YAML VFS description to a new file under 'docs/'. This file format is API, and should be documented.

  Can this wait until we have more practical experience with the file format? Things in docs/ tend to have better documentation that what I've written here, and I don't want to put a lot of effort into this until we're sure this is the right format.

  > Unrelated to this patch, but how do we keep track of the current working directory? Or we don't just yet?

  We don't :)  I'll look at whether that is simple to do now, or can wait.  It won't be exercised until we use the VFS outside of FileManager, since FileManager always creates absolute paths I think.

  > We also need to document how our case-insensitive matching interacts with Unicode.

  Right now it doesn't (I'm using StringRef::equals_lower, which is ASCII-based).  What is clang/llvm's story for unicode path handling? Can we expect a specific encoding, or does everyone who interacts with a path have to figure it out.


================
Comment at: lib/Basic/VirtualFileSystem.cpp:205
@@ +204,3 @@
+
+/// \brief Represents the value of a 'type' field.
+enum EntryKind {
----------------
Dmitri Gribenko wrote:
> This comment is not really helpful...
Will remove.

================
Comment at: lib/Basic/VirtualFileSystem.cpp:254
@@ +253,3 @@
+///   <optional configuration>
+///   'roots': [
+///              <directory entries>
----------------
Dmitri Gribenko wrote:
> If we want 'roots' to model Windows drives and similar things, then we also need some kind of label for each root.
Why? I'm not intimately familiar with Windows file systems.  I thought that the 'name' field would be sufficient.

================
Comment at: lib/Basic/VirtualFileSystem.cpp:252-257
@@ +251,8 @@
+/// The basic structure of the parsed file is:
+/// {
+///   <optional configuration>
+///   'roots': [
+///              <directory entries>
+///            ]
+/// }
+///
----------------
Dmitri Gribenko wrote:
> Please use \verbatim ... \endverbatim.  Doxygen HTML output looks better that way.
Will do.

================
Comment at: lib/Basic/VirtualFileSystem.cpp:260
@@ +259,3 @@
+/// Possible configuration settings are
+///   'case-sensitive': <boolean>
+///
----------------
Dmitri Gribenko wrote:
> Is it optional?  If so, what is the default?
> 
> Please also add a mandatory version field, and refuse to load files with incorrect version.  I hope we will never have to bump the version and evolve the format in a compatible way, but who knows...
It is optional, which is mentioned above.  I will specify its default value.

The version is going to make my unit tests ugly, but I see your point.

================
Comment at: lib/Basic/VirtualFileSystem.cpp:280
@@ +279,3 @@
+///   'name': <string>,
+///   'external-contents': <path to external file>)
+/// }
----------------
Dmitri Gribenko wrote:
> Stray ')'
> 
> While in file format 'external-contents' looks unambiguous, in source code
>   StringRef getExternalContents()
> and similar are not completely clear: does it return a path or the actual contents?  Maybe rename all similar variables and functions in source code to *ExternalContentsPath*?
Will do.

================
Comment at: lib/Basic/VirtualFileSystem.cpp:290
@@ +289,3 @@
+  /// \brief The file system to use for 'external-contents'.
+  IntrusiveRefCntPtr<FileSystem> ExternalFS; ///< The file system to use for
+
----------------
Dmitri Gribenko wrote:
> ... to use for *external references*?
Sure. I'll also delete the extra ///< comment

================
Comment at: lib/Basic/VirtualFileSystem.cpp:302-303
@@ +301,4 @@
+      : ExternalFS(ExternalFS) {
+    llvm::Triple Triple(LLVM_HOST_TRIPLE);
+    CaseSensitive = Triple.isOSDarwin() || Triple.isOSWindows();
+  }
----------------
Dmitri Gribenko wrote:
> I don't think it is a good idea to make this decision here, in a low-level library, especially by calling into LLVM.  Having a single default of 'case-sensitive' is better, IMHO -- at least, VFS mapping files will be interpreted the same way on all platforms.
But then the VFS file is interpreted differently from the host file system by default.  That seems just as bad.  For example, say I map over /Foo, on Darwin, and the user specifies a path in /foo.  Then we will get the real file system.

================
Comment at: lib/Basic/VirtualFileSystem.cpp:306-307
@@ +305,4 @@
+
+  /// \brief Looks up \p Path in \p Roots.
+  ErrorOr<Entry *> lookupPath(const Twine &Path);
+
----------------
Dmitri Gribenko wrote:
> Nitpick: it is not semantically correct to refer to Roots as \p, because it is not a parameter.
Good catch, I'll switch to \c.

================
Comment at: lib/Basic/VirtualFileSystem.cpp:321
@@ +320,3 @@
+  /// returns a virtual file system representing its contents.
+  static VFSFromYAML *Create(MemoryBuffer *Buffer,
+                             SourceMgr::DiagHandlerTy DiagHandler,
----------------
Dmitri Gribenko wrote:
> Function names start with a lowercase letter.
Will do.  I think I'm used to seeing this one spelled upper-case almost everywhere else.

================
Comment at: lib/Basic/VirtualFileSystem.cpp:330-331
@@ +329,4 @@
+
+// A helper class to hold the common YAML parsing state.
+class VFSFromYAMLParser {
+  yaml::Stream &Stream;
----------------
Dmitri Gribenko wrote:
> Three slashes?
Will do.

================
Comment at: lib/Basic/VirtualFileSystem.cpp:378-381
@@ +377,6 @@
+
+    bool hasName = false;
+    bool hasKind = false;
+    bool hasContents = false; // external or otherwise
+    bool hasExternalContents = false;
+    std::vector<Entry *> EntryArrayContents;
----------------
Dmitri Gribenko wrote:
> Variable names should start with an uppercase letter.
D'oh! Will do.

================
Comment at: lib/Basic/VirtualFileSystem.cpp:397-398
@@ +396,4 @@
+      if (Key == "name") {
+        if (!parseScalarString(I->getValue(), Value, Buffer))
+          return NULL;
+        hasName = true;
----------------
Dmitri Gribenko wrote:
> Missing check if (hasName) error() ?
Duplicate keys are illegal in YAML, so I thought the YAML parser would handle that for me.  Apparently that is not done in llvm's parser.  I will handle this and the similar cases explicitly.

================
Comment at: lib/Basic/VirtualFileSystem.cpp:477
@@ +476,3 @@
+
+    if (Kind == EK_File) {
+      return new FileEntry(Name, ExternalContents);
----------------
Dmitri Gribenko wrote:
> Please switch() so that we will get a warning if we add more enumerators.
Will do.

================
Comment at: lib/Basic/VirtualFileSystem.cpp:542-554
@@ +541,15 @@
+Entry::~Entry() {}
+DirectoryEntry::~DirectoryEntry() {
+  for (std::vector<Entry *>::iterator I = Contents.begin(), E = Contents.end();
+       I != E; ++I) {
+    delete *I;
+  }
+}
+
+VFSFromYAML::~VFSFromYAML() {
+  for (std::vector<Entry *>::iterator I = Roots.begin(), E =Roots.end(); I != E;
+       ++I) {
+    delete *I;
+  }
+}
+
----------------
Dmitri Gribenko wrote:
> llvm::DeleteContainerPointers?
Cool, didn't know about that one.

================
Comment at: lib/Basic/VirtualFileSystem.cpp:591-595
@@ +590,7 @@
+    if (I->equals("..")) {
+      if (Components.empty())
+        return error_code(errc::invalid_argument, system_category());
+      Components.pop_back();
+      if (Components.empty())
+        return error_code(errc::invalid_argument, system_category());
+      continue;
----------------
Dmitri Gribenko wrote:
> Why error here?  Maybe there is some precondition about 'Path' that is not documented?
> 
> /../../../ is /, because / has a '..' entry that points back to itself.
> 
> What about paths relative to current dir?  Should not be passed here?
> 
> Also, I don't think we can actually normalize the path without looking at the filesystem.  normalize_path("/zzz/..") -> "/", but this is not how OS works:
> 
>   $ ls /zzz/..
>   ls: /zzz/..: No such file or directory
> 
You're right.  I was trying to handle .., since it is pervasive in paths the compiler uses (e.g. clang/bin/../lib/...), but as you pointed out it will do the wrong thing for nonexistent paths.

I think .. needs to be handled pretty much the same way as symlinks will need to be handled.  Ultimately, we will need to refer to the top-level vfs::FileSystem to resolve links and .. entries.  I'd like to remove the normalize path function and defer handling these cases for now.  I am starting to think that to handle these cases efficiently, we will need to evolve towards a different interface between file systems so that you can ask for foo/.. and just lookup a pointer to a directory entry, rather than have to re-query the top level file system for a path.



================
Comment at: lib/Basic/VirtualFileSystem.cpp:674-675
@@ +673,4 @@
+    Status S = DE->getStatus();
+    S.setName(PathStr);
+    S.setExternalName(PathStr);
+    return S;
----------------
Dmitri Gribenko wrote:
> Why do we overwrite external name with the virtual path?
It's not being overwritten per se, since we initialized the name and external name fields to "".  It would be better to save these in the status object once, but when we create the directory entry, we don't have an easy way to get the entire path, only the last component of it.

As for why we provide a virtual path in ExternalName, the reason is that ExternalName is supposed to be a usable string in all cases, so no one has to ask, is there an external name? if not, use 'name'.  This means that ExternalName == Name, *except* when we are dealing with a file that is mapped to disk (using 'external-contents').

================
Comment at: lib/Basic/VirtualFileSystem.cpp:705
@@ +704,3 @@
+  // dev_t value from the OS.
+  return UniqueID(std::numeric_limits<uint64_t>::max(), ID);
+}
----------------
Dmitri Gribenko wrote:
> This will only work correctly only as long as only a single YAML-based VFS exists, right?..
It will 'work' for any number of VFSes, since the unique IDs will never collide (globally increasing file id prevents this).  The files/directories will all have the same 'device' in the unique ID, but that should be fine, since I don't think userspace programs are able to infer anything from that number anyway, except that together with the inode number they provide a unique file identifier.

================
Comment at: unittests/Basic/VirtualFileSystemTest.cpp:229
@@ +228,3 @@
+
+TEST(VirtualFileSystemTest, basicVFSFromYAML) {
+  IntrusiveRefCntPtr<vfs::FileSystem> FS;
----------------
Dmitri Gribenko wrote:
> It would be nice to also check that we produce at least a single diagnostic in this case.
> 
> Actually it would be really good if we had one test for every diagnostic we produce.
I was planning to do this once I have added the clang option, and can use lit+FileCheck.  Doing diagnostic checks in the unit tests seems painful.

================
Comment at: unittests/Basic/VirtualFileSystemTest.cpp:246
@@ +245,3 @@
+    "  'type': 'directory',\n"
+    "  'name': '/',\n"
+    "  'contents': [ {\n"
----------------
Dmitri Gribenko wrote:
> Do we check while parsing the file that the top-level entity is a directory with name '/'?
Nope.  I don't know if we want to or not, so I erred on the side of being permissive in the code, but stricter in the tests.  That way I hopefully won't have to change both later :)

================
Comment at: unittests/Basic/VirtualFileSystemTest.cpp:296
@@ +295,3 @@
+
+TEST(VirtualFileSystemTest, caseSensitive) {
+  IntrusiveRefCntPtr<DummyFileSystem> Lower(new DummyFileSystem());
----------------
Dmitri Gribenko wrote:
> Test name should probably be 'CaseInsensitive'.
Yes, will do.  I assume you mean 'caseInsenstive' (lower case c).

================
Comment at: unittests/Basic/VirtualFileSystemTest.cpp:348-349
@@ +347,4 @@
+
+  IntrusiveRefCntPtr<vfs::OverlayFileSystem>
+    O(new vfs::OverlayFileSystem(Lower));
+  O->pushOverlay(FS);
----------------
Dmitri Gribenko wrote:
> Here and in other places: when breaking a line, the second and latter lines are double-indented at least.  I am not sure if this is explicit in the style guide, but this is what clang-format does.
I've seen a big variety of line-breaking styles throughout llvm.  I'll try running clang-format on these tests.


http://llvm-reviews.chandlerc.com/D2835



More information about the cfe-commits mailing list