[llvm-commits] PathV2 review notes

Dan Gohman gohman at apple.com
Mon Dec 6 09:48:00 PST 2010


Hello,

I read through the current PathV2 platform-independent and Unix code.
I'm aware that it's not complete yet, but there's already a lot of code there.
Below are some review comments.

include/llvm/Support/PathV2.h:

> /// @name Lexical Modifiers
> /// @{

> /// @brief Make \a path an absolute path.
> ///
> /// Makes \a path absolute using the current directory if it is not already. An
> /// empty \a path will result in the current directory.
> ///
> /// /absolute/path   => /absolute/path
> /// relative/../path => <current-directory>/path
> ///
> /// @param path A path that is modified to be an absolute path.
> /// @returns errc::success if \a path has been made absolute, otherwise a
> ///          platform specific error_code.
> error_code make_absolute(SmallVectorImpl<char> &path);

The top-level section comment suggests that this is a lexical
operation, but it is not.

> /// @brief Remove the last component from \a path if it exists.
> ///
> /// directory/filename.cpp => directory/
> /// directory/             => directory
> ///
> /// @param path A path that is modified to not have a file component.
> /// @returns errc::success if \a path's file name has been removed (or there was
> ///          not one to begin with), otherwise a platform specific error_code.
> error_code remove_filename(SmallVectorImpl<char> &path);

The words "if it exists" are a little misleading in this context, since
this is actually a purely lexical transformation. The fact that this function
returns an error code contributes to the ambiguity.

On a related note, there are many functions (root_directory, root_path,
is_absolute, etc.) like this which are purely lexical, and implemented
with platform-independent code, and which always return success. It's
unfortunate that clients are required to cope with "a platform specific
error_code" when working with any of these interfaces, since it isn't
really needed.

Unlike Boost, LLVM doesn't propogate malloc errors. If malloc fails,
LLVM crashes (at best). We can have interesting discussions about
whether or not this is a bug, but it does simplify many things.
Also unlike Boost, LLVM's libSupport can easily change its API if it
ever somehow makes sense for these functions to return errors in the
future.

> /// @brief Replace the file extension of \a path with \a extension.
> ///
> /// ./filename.cpp => ./filename.extension
> /// ./filename     => ./filename.extension
> /// ./             => ? TODO: decide what semantics this has.
> ///
> /// @param path A path that has its extension replaced with \a extension.
> /// @param extension The extension to be added. It may be empty. It may also
> ///                  optionally start with a '.', if it does not, one will be
> ///                  prepended.
> /// @returns errc::success if \a path's extension has been replaced, otherwise a
> ///          platform specific error_code.
> error_code replace_extension(SmallVectorImpl<char> &path,
>                              const Twine &extension);

The behavior of prepending a '.' if the suffix doesn't already have
one is confusing. Since extension includes the dot, it would be
consistent for replace_extension to require it. It could even
verify this with an assert.

> /// @name Lexical Observers
> /// @{
> 
> /// @brief Get the current path.
> ///
> /// @param result Holds the current path on return.
> /// @results errc::success if the current path has been stored in result,
> ///          otherwise a platform specific error_code.
> error_code current_path(SmallVectorImpl<char> &result);

Same as above; this isn't a purely lexical operation.

> /// @brief Get root name.
> ///
> /// //net/hello => //net
> /// c:/hello    => c: (on Windows, on other platforms nothing)
> /// /hello      => <empty>

UNC-style double-slash pathnames are not well known in Unix circles;
a comment on that first example would be helpful.

As an aside, does it really make sense to implement UNC pathnames
on systems where the underlying libc API doesn't implement them? I
realize you're just following Boost here though.

More broadly, there seem to be two major ways of bisecting paths
in this API: root+relative and parent+filename (as this API names them).
Parent+filename is obvious, but when would root+relative be useful for
LLVM? Compilers and related tools should never go digging around in
the filesystem root on their own.

lib/Support/PathV2.cpp

> native

>   // Clear result.
>   result.set_size(0);

Should this be result.clear()? set_size is usually an indication
that something special is happening, and that doesn't appear to
be the case here.

> error_code has_root_directory(const Twine &path, bool &result) {
>   SmallString<128> path_storage;
>   StringRef p = path.toStringRef(path_storage);
> 
>   if (error_code ec = root_directory(p, p)) return ec;
> 
>   result = !p.empty();
>   return success;
> }

Calling root_directory(p, p) here looks unsafe: p isn't guaranteed
to be pointing to path_storage, and root_directory writes
through it.

lib/Support/Unix/PathV2.inc:

> #if HAVE_STDIO_H
> #include <stdio.h>
> #endif

Shouldn't stdio.h always be available?

>     ~AutoFD() {
>       if (FileDescriptor >= 0)
>         ::close(FileDescriptor);
>     }

This AutoFD class not check for errors on ::close, so hopefully it'll only
be used in contexts where that doesn't matter. A comment about this would
be appropriate.

> TempDir

>     (dir = std::getenv("TMPDIR" )) ||

It's suspicious for the path library to be using "TMPDIR" privately
like this. What is this for?

> error_code current_path(SmallVectorImpl<char> &result) {
>   long size = ::pathconf(".", _PC_PATH_MAX);
>   result.reserve(size + 1);
>   result.set_size(size + 1);
> 
>   if (::getcwd(result.data(), result.size()) == 0)
>     return error_code(errno, system_category());
> 
>   result.set_size(strlen(result.data()));
>   return success;
> }

POSIX describes _PC_PATH_MAX as indicating "the longest relative pathname
that could be given if the specified directory is the process' current
working directory", which isn't what it's being used for here. For a
fully general implementation, just do a simple loop which starts at
MAXPATHLEN and reallocates the buffer until getcwd succeeds.

> namespace fs{

Whitespace before {.

> copy_file

FWIW, it appears copy_file is apparently only used by llvm-ld and only on
Windows, so the Unix implementation is currently dead code.

>   // Open from.
>   if ((from_file = ::open(f.begin(), O_RDONLY)) < 0)
>     return error_code(errno, system_category());
>   AutoFD from_fd(from_file);
> 
>   // Stat from.
>   struct stat from_stat;
>   if (::stat(f.begin(), &from_stat) != 0)
>     return error_code(errno, system_category());

It's more efficient to use fstat than stat, since the file descriptor
is right there. llvm-ld's use of copy_file probably doesn't care about
performance, but in general this is an idiom to watch out for.

>   if (copt == copy_option::fail_if_exists)
>     to_flags |= O_EXCL;

The old CopyFile implementation doesn't have this. What is it for?

>   // Open to.
>   if ((to_file = ::open(t.begin(), to_flags, from_stat.st_mode)) < 0)
>     return error_code(errno, system_category());
>   AutoFD to_fd(to_file);

AutoFD isn't really needed here, and in fact it makes the code harder
to follow. If AutoFD ever does its job, it would be calling ::close
without checking for errors, so the only way this code is correct is
if AutoFD is never permitted to do its job.

>   // After all the file operations above the return value of close actually
>   // matters.
>   if (::close(from_fd.take()) < 0) sz_read = -1;
>   if (::close(to_fd.take()) < 0) sz_read = -1;
> 
>   // Check for errors.
>   if (sz_read < 0)
>     return error_code(errno, system_category());

If errors from ::close on from_fd actually matter, this risks reporting
the wrong error message, since errno could be clobbered by the second
::close (even if it succeeds).

Also, the old CopyFile returned error messages which indicated
whether the error was in the source or the destination. This API
doesn't expose that information at all, so clients won't be able to
produce a descriptive error message.

> create_directory

>   if (::mkdir(p.begin(), 0700) == -1) {

Please use the symbolic names instead of a raw octal value.
I believe this is S_IRWXU.

Also, the old createDiectoryOnDisk used S_IRWXU | S_IRWXG. Is
it an intentional change to start ignoring the group portion of
the user's umask settings?  If so, please add comments
explaining this choice.

Also, is the 'existed' argument really needed?

> create_hard_link
> create_symlink

The old Path library doesn't have these. What are they for?

> remove

>   if (::remove(p.begin()) == -1) {
>     if (errno != errc::no_such_file_or_directory)
>       return error_code(errno, system_category());

Using the errc namespace here conflates API levels. This code
should use ENOENT directly instead of no_such_file_or_directory.

> resize_file

The old Path doesn't have this. What is it for?

> error_code exists(const Twine &path, bool &result) {

The old Path::exists returned false on EACCESS. This code will return
an error condition. Is this an intentional change?

> file_size

What is this for? Doing a full ::stat call and throwing away all
the information except st_size is suspicious.

> unique_file

>   [... lots of stuff ...]

Please don't try to do what mkstemp does manually. Just use mkstemp.

mkstemp is very widely available these days. For portability to
exotic platforms, it's fine to put the burden on people who care
to contribute patches.

Dan





More information about the llvm-commits mailing list