[PATCH] D17104: [VFS] Drop path traversal assertion

Bruno Cardoso Lopes via cfe-commits cfe-commits at lists.llvm.org
Thu Feb 11 12:48:27 PST 2016


bruno added a comment.

> I don't think this is the right approach.  If we don't canonicalize the source path then:

> 

> - looking up the path *without* the .. won't work, which means anything that looks up a realpath will fail


If we canonicalize, then it needs to be done in two places:

1. In the module dependence collector while collecting header files, which will then collect the real path.

Doing it at this point guarantees that we will only write out canonicalized paths to the YAML file.

2. While requesting paths from the VFS overlay, we must first canonicalize

the path in

  RedirectingFileSystem::lookupPath(const Twine &Path_)

before calling out

  RedirectingFileSystem::lookupPath(sys::path::const_iterator Start,
                                    sys::path::const_iterator End, Entry *From)

One way to make this work is to use llvm::sys::path::remove_dots(Path, true) in (1) and (2).
That said, instead of this YAML output:

{

  'type': 'directory',
  'name': "/install-dir/bin/../lib/clang/3.8.0/include",
  'contents': [
    ...
    {
      'type': 'file',
      'name': "altivec.h",
      'external-contents': "<path_to_cache>/install-dir/bin/../lib/clang/3.8.0/include/altivec.h"
    },
    ...

We would have:

{

  'type': 'directory',
  'name': "/install-dir/lib/clang/3.8.0/include",
  'contents': [
    ...
    {
      'type': 'file',
      'name': "altivec.h",
      'external-contents': "<path_to_cache>/install-dir/lib/clang/3.8.0/include/altivec.h"
    },
    ...

Therefore, while using the new YAML file, whenever the FileManager requests status in (2) for the path "/install-dir/bin/../lib/clang/3.8.0/include/altivec.h", it first call llvm::sys::path::remove_dots(Path, true), which will give back "/install-dir/lib/clang/3.8.0/include/altivec.h", lookupPath is going to be able to match this entry from the YAML file and success!

However, if in the original filesystem "bin" is a symlink, in (1) we would have to use "realpath()" instead of "remove_dots()", let's say it yields "/private/tmp/a/b/lib/clang/3.8.0/include/altivec.h", the YAML will look like:

{

  'type': 'directory',
  'name': "/private/tmp/a/b/lib/clang/3.8.0/include",
  'contents': [
    ...
    {
      'type': 'file',
      'name': "altivec.h",
      'external-contents': "<path_to_cache>/private/tmp/a/b/lib/clang/3.8.0/include/altivec.h"
    },
    ...

Now, we try to use the new YAML file + cache from a new clang invocation. When the FileManager requests status in (2) for the path "/install-dir/bin/../lib/clang/3.8.0/include/altivec.h" neither "remove_dots()" nor "realpath()" would be able to resolve it to "/private/tmp/a/b/lib/clang/3.8.0/include/altivec.h" and will fallback to search the real filesystem instead, if this is a different machine, it will fail. Since we are looking at a virtual path in (2) it could makes sense to use remove_dots() but it won't make sense to use "realpath()".

How do you propose we fix that?

Another issue is that "realpath()" is expensive, but I guess we could only use it whenever the path contains ".." components and cache the base directory for speeding up translations. Another solution is to generate two entries in the YAML file:

{

  'type': 'directory',
  'name': "/install-dir/bin/../lib/clang/3.8.0/include",
  'contents': [
    ...
    {
      'type': 'file',
      'name': "altivec.h",
      'external-contents': "<path_to_cache>/private/tmp/a/b/lib/clang/3.8.0/include/altivec.h"
    },
  'type': 'directory',
  'name': "/private/tmp/a/b/lib/clang/3.8.0/include",
  'contents': [
    ...
    {
      'type': 'file',
      'name': "altivec.h",
      'external-contents': "<path_to_cache>/private/tmp/a/b/lib/clang/3.8.0/include/altivec.h"
    },
    ...

> - directory iteration won't combine entries from foo/bar/.. and foo/


Yes, but if "bar" is a symbolic link we have another set of problems, like mentioned above.

> I think we're better off handling ".." in lookup and always using canonicalized paths in the YAML representation.


Makes sense, we just need to come with a general solution for the symbolic link in path components.


http://reviews.llvm.org/D17104





More information about the cfe-commits mailing list