[llvm-commits] PathV2 review notes

Mon Dec 6 17:21:56 PST 2010

On Dec 6, 2010, at 1:50 PM, Michael Spencer wrote:

> On Mon, Dec 6, 2010 at 12:48 PM, Dan Gohman <gohman at apple.com> wrote:
> 
>> include/llvm/Support/PathV2.h:
>> 
>>> /// @name Lexical Modifiers
>>> /// @{
>> 
>>> /// @brief Make \a path an absolute path.
>>> ///
>>> /// Makes \a path absolute using the current directory if it is not already. An
>>> /// empty \a path will result in the current directory.
>>> ///
>>> /// /absolute/path   => /absolute/path
>>> /// relative/../path => <current-directory>/path
>>> ///
>>> /// @param path A path that is modified to be an absolute path.
>>> /// @returns errc::success if \a path has been made absolute, otherwise a
>>> ///          platform specific error_code.
>>> error_code make_absolute(SmallVectorImpl<char> &path);
>> 
>> The top-level section comment suggests that this is a lexical
>> operation, but it is not.
> 
> I wasn't completely sure where to group this function. It is lexical
> in that it does not resolve any symlinks or other weirdness. However,
> it does use the current directory if the path is not already absolute.
> 
> I also just noticed that the documentation is incorrect. This function
> does not resolve '.' or '..', thus:
> relative/../path => <current-directory>/relative/../path
> 
> As it stands, I would either agree with keeping the function as is and
> moving it over to sys::fs and FileSystem.h, or I would like to add a
> base argument and a sys::fs::initial_directory function (which returns
> the current directory as it would be immediately before main).
> 
> This really depends on how often and when a base directory other than
> current is used. If make_absolute is used in any of the libraries then
> there is technically a race condition, because another part of the
> process could change the current directory at any time (although most
> programs don't do this).

Moving it to FileSystem.h makes sense to me.

>> On a related note, there are many functions (root_directory, root_path,
>> is_absolute, etc.) like this which are purely lexical, and implemented
>> with platform-independent code, and which always return success. It's
>> unfortunate that clients are required to cope with "a platform specific
>> error_code" when working with any of these interfaces, since it isn't
>> really needed.
>> 
>> Unlike Boost, LLVM doesn't propogate malloc errors. If malloc fails,
>> LLVM crashes (at best). We can have interesting discussions about
>> whether or not this is a bug, but it does simplify many things.
>> Also unlike Boost, LLVM's libSupport can easily change its API if it
>> ever somehow makes sense for these functions to return errors in the
>> future.
> 
> I agree that it is awkward. I decided to return error_codes to stay
> consistent with the rest of the API, and to allow handling memory
> errors. However, now that we have the path, and fs namespace split, I
> think it would be ok to remove the error_code return from path. I
> don't see SmallVector and friends getting any support for reporting
> memory allocation errors any time soon.
> 
> This would also require moving current_path (and make_absolute as it
> is currently designed) into fs.

I think that sounds reasonable. 

> 
>>> /// @brief Replace the file extension of \a path with \a extension.
>>> ///
>>> /// ./filename.cpp => ./filename.extension
>>> /// ./filename     => ./filename.extension
>>> /// ./             => ? TODO: decide what semantics this has.
>>> ///
>>> /// @param path A path that has its extension replaced with \a extension.
>>> /// @param extension The extension to be added. It may be empty. It may also
>>> ///                  optionally start with a '.', if it does not, one will be
>>> ///                  prepended.
>>> /// @returns errc::success if \a path's extension has been replaced, otherwise a
>>> ///          platform specific error_code.
>>> error_code replace_extension(SmallVectorImpl<char> &path,
>>>                              const Twine &extension);
>> 
>> The behavior of prepending a '.' if the suffix doesn't already have
>> one is confusing. Since extension includes the dot, it would be
>> consistent for replace_extension to require it. It could even
>> verify this with an assert.
> 
> I don't have a strong opinion on this either way other than I don't
> feel it's confusing.

I find it confusing when some clients use the API differently
than other clients in ways that appear significant but aren't.

>>> /// @brief Get root name.
>>> ///
>>> /// //net/hello => //net
>>> /// c:/hello    => c: (on Windows, on other platforms nothing)
>>> /// /hello      => <empty>
>> 
>> UNC-style double-slash pathnames are not well known in Unix circles;
>> a comment on that first example would be helpful.
> 
> OK, from what I understand POSIX supports them.

All POSIX says is "A pathname that begins with two successive slashes
may be interpreted in an implementation-defined manner." In practice,
I don't know any implementation which does anything with this.

> 
>> As an aside, does it really make sense to implement UNC pathnames
>> on systems where the underlying libc API doesn't implement them? I
>> realize you're just following Boost here though.
>> 
>> More broadly, there seem to be two major ways of bisecting paths
>> in this API: root+relative and parent+filename (as this API names them).
>> Parent+filename is obvious, but when would root+relative be useful for
>> LLVM? Compilers and related tools should never go digging around in
>> the filesystem root on their own.
> 
> I do not expect these functions to be used directly, however, they are
> used extensively in the implementation of other functions.

Ok. It may make sense to make them private to the PathV2 implementation then.

> 
>> lib/Support/PathV2.cpp
> 
>>> error_code has_root_directory(const Twine &path, bool &result) {
>>>   SmallString<128> path_storage;
>>>   StringRef p = path.toStringRef(path_storage);
>>> 
>>>   if (error_code ec = root_directory(p, p)) return ec;
>>> 
>>>   result = !p.empty();
>>>   return success;
>>> }
>> 
>> Calling root_directory(p, p) here looks unsafe: p isn't guaranteed
>> to be pointing to path_storage, and root_directory writes
>> through it.
> 
> root_directory doesn't write.

Ok, I see now.

>>> TempDir
>> 
>>>     (dir = std::getenv("TMPDIR" )) ||
>> 
>> It's suspicious for the path library to be using "TMPDIR" privately
>> like this. What is this for?
> 
> I implemented the function the way the Open Group standard defines
> tmpname as looking for the temp directory. It is private because
> clients should not care where the temp directory is or directly use
> it.

I now see that this is just part of the unique_file implementation
which I claim is unnecessary ;-).

>>> copy_file
>> 
>> FWIW, it appears copy_file is apparently only used by llvm-ld and only on
>> Windows, so the Unix implementation is currently dead code.
> 
> Should it be removed?

Sure.

>>> remove
>> 
>>>   if (::remove(p.begin()) == -1) {
>>>     if (errno != errc::no_such_file_or_directory)
>>>       return error_code(errno, system_category());
>> 
>> Using the errc namespace here conflates API levels. This code
>> should use ENOENT directly instead of no_such_file_or_directory.
> 
> I used the errc enum constants because they are infinitely more
> readable than the ERRNO macros. It's safe because they are guaranteed
> to have the same value.

I find ENOENT more readable because it's what the API that you're
using here is documented to use.

>>> file_size
>> 
>> What is this for? Doing a full ::stat call and throwing away all
>> the information except st_size is suspicious.
> 
> There are a lot of functions that take both a path and a file_status,
> this should be one of them. If you only need the file size you just
> call this, otherwise you call status and reuse the file_status class
> (which currently only stores st_mode, but that will change).

My main observation here is that nothing in LLVM ever just needs the
file size without needing anything else.

> 
>>> unique_file
>> 
>>>   [... lots of stuff ...]
>> 
>> Please don't try to do what mkstemp does manually. Just use mkstemp.
>> 
>> mkstemp is very widely available these days. For portability to
>> exotic platforms, it's fine to put the burden on people who care
>> to contribute patches.
>> 
>> Dan
> 
> There are quite a few things I don't like about using mkstemp for LLVM:
> * You can't specify any of the flags used to create the file.

That's true, but I don't think it's important here.

> * Filenames are required to end in XXXXXX (I've never understood this
> requirement).

Those are the characters that mkstemp overwrites to make the filename
unique.

> * You can only use an absolute path, so you end up explicitly using
> /tmp. Although this can be handled before mkstemp is called.

mkstemp does support relative paths.

> 
> If we're OK with all that, then I don't mind switching the Unix impl
> over to mkstemp if it's present.

Thanks,

Dan