[llvm-commits] PathV2 review notes

Michael Spencer bigcheesegs at gmail.com
Mon Dec 6 13:50:59 PST 2010


On Mon, Dec 6, 2010 at 12:48 PM, Dan Gohman <gohman at apple.com> wrote:
> Hello,
>
> I read through the current PathV2 platform-independent and Unix code.
> I'm aware that it's not complete yet, but there's already a lot of code there.
> Below are some review comments.

Hello and thanks very much for looking at this. I am not a Unix/POSIX
API expert, in fact, this is the first time I've looked at the
documentation for most of these calls.

> include/llvm/Support/PathV2.h:
>
>> /// @name Lexical Modifiers
>> /// @{
>
>> /// @brief Make \a path an absolute path.
>> ///
>> /// Makes \a path absolute using the current directory if it is not already. An
>> /// empty \a path will result in the current directory.
>> ///
>> /// /absolute/path   => /absolute/path
>> /// relative/../path => <current-directory>/path
>> ///
>> /// @param path A path that is modified to be an absolute path.
>> /// @returns errc::success if \a path has been made absolute, otherwise a
>> ///          platform specific error_code.
>> error_code make_absolute(SmallVectorImpl<char> &path);
>
> The top-level section comment suggests that this is a lexical
> operation, but it is not.

I wasn't completely sure where to group this function. It is lexical
in that it does not resolve any symlinks or other weirdness. However,
it does use the current directory if the path is not already absolute.

I also just noticed that the documentation is incorrect. This function
does not resolve '.' or '..', thus:
relative/../path => <current-directory>/relative/../path

As it stands, I would either agree with keeping the function as is and
moving it over to sys::fs and FileSystem.h, or I would like to add a
base argument and a sys::fs::initial_directory function (which returns
the current directory as it would be immediately before main).

This really depends on how often and when a base directory other than
current is used. If make_absolute is used in any of the libraries then
there is technically a race condition, because another part of the
process could change the current directory at any time (although most
programs don't do this).

>> /// @brief Remove the last component from \a path if it exists.
>> ///
>> /// directory/filename.cpp => directory/
>> /// directory/             => directory
>> ///
>> /// @param path A path that is modified to not have a file component.
>> /// @returns errc::success if \a path's file name has been removed (or there was
>> ///          not one to begin with), otherwise a platform specific error_code.
>> error_code remove_filename(SmallVectorImpl<char> &path);
>
> The words "if it exists" are a little misleading in this context, since
> this is actually a purely lexical transformation. The fact that this function
> returns an error code contributes to the ambiguity.

Yes, the "if it exists" is incorrect. The function actually removes
the last component unless it is the root directory.

> On a related note, there are many functions (root_directory, root_path,
> is_absolute, etc.) like this which are purely lexical, and implemented
> with platform-independent code, and which always return success. It's
> unfortunate that clients are required to cope with "a platform specific
> error_code" when working with any of these interfaces, since it isn't
> really needed.
>
> Unlike Boost, LLVM doesn't propogate malloc errors. If malloc fails,
> LLVM crashes (at best). We can have interesting discussions about
> whether or not this is a bug, but it does simplify many things.
> Also unlike Boost, LLVM's libSupport can easily change its API if it
> ever somehow makes sense for these functions to return errors in the
> future.

I agree that it is awkward. I decided to return error_codes to stay
consistent with the rest of the API, and to allow handling memory
errors. However, now that we have the path, and fs namespace split, I
think it would be ok to remove the error_code return from path. I
don't see SmallVector and friends getting any support for reporting
memory allocation errors any time soon.

This would also require moving current_path (and make_absolute as it
is currently designed) into fs.

>> /// @brief Replace the file extension of \a path with \a extension.
>> ///
>> /// ./filename.cpp => ./filename.extension
>> /// ./filename     => ./filename.extension
>> /// ./             => ? TODO: decide what semantics this has.
>> ///
>> /// @param path A path that has its extension replaced with \a extension.
>> /// @param extension The extension to be added. It may be empty. It may also
>> ///                  optionally start with a '.', if it does not, one will be
>> ///                  prepended.
>> /// @returns errc::success if \a path's extension has been replaced, otherwise a
>> ///          platform specific error_code.
>> error_code replace_extension(SmallVectorImpl<char> &path,
>>                              const Twine &extension);
>
> The behavior of prepending a '.' if the suffix doesn't already have
> one is confusing. Since extension includes the dot, it would be
> consistent for replace_extension to require it. It could even
> verify this with an assert.

I don't have a strong opinion on this either way other than I don't
feel it's confusing.

>> /// @name Lexical Observers
>> /// @{
>>
>> /// @brief Get the current path.
>> ///
>> /// @param result Holds the current path on return.
>> /// @results errc::success if the current path has been stored in result,
>> ///          otherwise a platform specific error_code.
>> error_code current_path(SmallVectorImpl<char> &result);
>
> Same as above; this isn't a purely lexical operation.

I agree.

>> /// @brief Get root name.
>> ///
>> /// //net/hello => //net
>> /// c:/hello    => c: (on Windows, on other platforms nothing)
>> /// /hello      => <empty>
>
> UNC-style double-slash pathnames are not well known in Unix circles;
> a comment on that first example would be helpful.

OK, from what I understand POSIX supports them.

> As an aside, does it really make sense to implement UNC pathnames
> on systems where the underlying libc API doesn't implement them? I
> realize you're just following Boost here though.
>
> More broadly, there seem to be two major ways of bisecting paths
> in this API: root+relative and parent+filename (as this API names them).
> Parent+filename is obvious, but when would root+relative be useful for
> LLVM? Compilers and related tools should never go digging around in
> the filesystem root on their own.

I do not expect these functions to be used directly, however, they are
used extensively in the implementation of other functions.

> lib/Support/PathV2.cpp
>
>> native
>
>>   // Clear result.
>>   result.set_size(0);
>
> Should this be result.clear()? set_size is usually an indication
> that something special is happening, and that doesn't appear to
> be the case here.

You are correct. I think I assumed that clear freed memory or something.

>> error_code has_root_directory(const Twine &path, bool &result) {
>>   SmallString<128> path_storage;
>>   StringRef p = path.toStringRef(path_storage);
>>
>>   if (error_code ec = root_directory(p, p)) return ec;
>>
>>   result = !p.empty();
>>   return success;
>> }
>
> Calling root_directory(p, p) here looks unsafe: p isn't guaranteed
> to be pointing to path_storage, and root_directory writes
> through it.

root_directory doesn't write.

> lib/Support/Unix/PathV2.inc:
>
>> #if HAVE_STDIO_H
>> #include <stdio.h>
>> #endif
>
> Shouldn't stdio.h always be available?

I didn't add the configure check :P. I just added the HAVE_ checks
either where I knew there could be a problem, or it already existed.

>>     ~AutoFD() {
>>       if (FileDescriptor >= 0)
>>         ::close(FileDescriptor);
>>     }
>
> This AutoFD class not check for errors on ::close, so hopefully it'll only
> be used in contexts where that doesn't matter. A comment about this would
> be appropriate.

OK.

>> TempDir
>
>>     (dir = std::getenv("TMPDIR" )) ||
>
> It's suspicious for the path library to be using "TMPDIR" privately
> like this. What is this for?

I implemented the function the way the Open Group standard defines
tmpname as looking for the temp directory. It is private because
clients should not care where the temp directory is or directly use
it.

>> error_code current_path(SmallVectorImpl<char> &result) {
>>   long size = ::pathconf(".", _PC_PATH_MAX);
>>   result.reserve(size + 1);
>>   result.set_size(size + 1);
>>
>>   if (::getcwd(result.data(), result.size()) == 0)
>>     return error_code(errno, system_category());
>>
>>   result.set_size(strlen(result.data()));
>>   return success;
>> }
>
> POSIX describes _PC_PATH_MAX as indicating "the longest relative pathname
> that could be given if the specified directory is the process' current
> working directory", which isn't what it's being used for here. For a
> fully general implementation, just do a simple loop which starts at
> MAXPATHLEN and reallocates the buffer until getcwd succeeds.

OK.

>> namespace fs{
>
> Whitespace before {.

OK.

>> copy_file
>
> FWIW, it appears copy_file is apparently only used by llvm-ld and only on
> Windows, so the Unix implementation is currently dead code.

Should it be removed?

>>   // Open from.
>>   if ((from_file = ::open(f.begin(), O_RDONLY)) < 0)
>>     return error_code(errno, system_category());
>>   AutoFD from_fd(from_file);
>>
>>   // Stat from.
>>   struct stat from_stat;
>>   if (::stat(f.begin(), &from_stat) != 0)
>>     return error_code(errno, system_category());
>
> It's more efficient to use fstat than stat, since the file descriptor
> is right there. llvm-ld's use of copy_file probably doesn't care about
> performance, but in general this is an idiom to watch out for.

OK.

>>   if (copt == copy_option::fail_if_exists)
>>     to_flags |= O_EXCL;
>
> The old CopyFile implementation doesn't have this. What is it for?

It allows correctly choosing to either overwrite or error when the to
file already exists. Not having an option forces a race condition. It
was added because it was part of the filesystem v3 API.

>>   // Open to.
>>   if ((to_file = ::open(t.begin(), to_flags, from_stat.st_mode)) < 0)
>>     return error_code(errno, system_category());
>>   AutoFD to_fd(to_file);
>
> AutoFD isn't really needed here, and in fact it makes the code harder
> to follow. If AutoFD ever does its job, it would be calling ::close
> without checking for errors, so the only way this code is correct is
> if AutoFD is never permitted to do its job.

You are correct that the to_fd AutoFD is unused. The from_fd is used
correctly if either of the two return statements after it are reached.

>>   // After all the file operations above the return value of close actually
>>   // matters.
>>   if (::close(from_fd.take()) < 0) sz_read = -1;
>>   if (::close(to_fd.take()) < 0) sz_read = -1;
>>
>>   // Check for errors.
>>   if (sz_read < 0)
>>     return error_code(errno, system_category());
>
> If errors from ::close on from_fd actually matter, this risks reporting
> the wrong error message, since errno could be clobbered by the second
> ::close (even if it succeeds).

It only matters if an error occurred, not which fd caused it.

> Also, the old CopyFile returned error messages which indicated
> whether the error was in the source or the destination. This API
> doesn't expose that information at all, so clients won't be able to
> produce a descriptive error message.

Adding that information is easy enough. I just need to add a custom
error_category for it. However, if copy_file is really not used, it
doesn't seem worth it.

>> create_directory
>
>>   if (::mkdir(p.begin(), 0700) == -1) {
>
> Please use the symbolic names instead of a raw octal value.
> I believe this is S_IRWXU.

OK.

> Also, the old createDiectoryOnDisk used S_IRWXU | S_IRWXG. Is
> it an intentional change to start ignoring the group portion of
> the user's umask settings?  If so, please add comments
> explaining this choice.

It was not.

> Also, is the 'existed' argument really needed?

It helps in the implementation of some other functions.

>> create_hard_link
>> create_symlink
>
> The old Path library doesn't have these. What are they for?

Mainly I wanted them to add tests for equivalence for when the clang
preprocessor is switched over. I have no problem removing them.

>> remove
>
>>   if (::remove(p.begin()) == -1) {
>>     if (errno != errc::no_such_file_or_directory)
>>       return error_code(errno, system_category());
>
> Using the errc namespace here conflates API levels. This code
> should use ENOENT directly instead of no_such_file_or_directory.

I used the errc enum constants because they are infinitely more
readable than the ERRNO macros. It's safe because they are guaranteed
to have the same value.

>> resize_file
>
> The old Path doesn't have this. What is it for?

It's unused. I have no problem removing it.

>> error_code exists(const Twine &path, bool &result) {
>
> The old Path::exists returned false on EACCESS. This code will return
> an error condition. Is this an intentional change?

Yes. Not having permission is not the same as the file not existing.

>> file_size
>
> What is this for? Doing a full ::stat call and throwing away all
> the information except st_size is suspicious.

There are a lot of functions that take both a path and a file_status,
this should be one of them. If you only need the file size you just
call this, otherwise you call status and reuse the file_status class
(which currently only stores st_mode, but that will change).

>> unique_file
>
>>   [... lots of stuff ...]
>
> Please don't try to do what mkstemp does manually. Just use mkstemp.
>
> mkstemp is very widely available these days. For portability to
> exotic platforms, it's fine to put the burden on people who care
> to contribute patches.
>
> Dan

There are quite a few things I don't like about using mkstemp for LLVM:
* You can't specify any of the flags used to create the file.
* Filenames are required to end in XXXXXX (I've never understood this
requirement).
* You can only use an absolute path, so you end up explicitly using
/tmp. Although this can be handled before mkstemp is called.

If we're OK with all that, then I don't mind switching the Unix impl
over to mkstemp if it's present.

Thanks very much for the detailed review!

- Michael Spencer




More information about the llvm-commits mailing list