[PATCH] D87667: [Support/Path] Add path::is_gnu_style_absolute

James Henderson via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Sep 16 01:24:22 PDT 2020


jhenderson added a comment.

In D87667#2275659 <https://reviews.llvm.org/D87667#2275659>, @tinti wrote:

> In D87667#2273502 <https://reviews.llvm.org/D87667#2273502>, @jhenderson wrote:
>
>>> On Windows, paths starting with drive letter pattern are absolute.
>>
>> This isn't a difference is it? That's also the behaviour for `is_absolute`, I believe?
>
> I forgot this comment.
>
> `is_absolute` is really hard to read. But I think you are right.
>
> I think it considers `c:\` as absolute but does not `c:` [1].
>
> [1] https://github.com/llvm/llvm-project/blob/master/llvm/lib/Support/Path.cpp#L69

That would make sense because `c:` is not a valid path on Windows. For example, if you try to do `cd c:` or `dir c:` in a command prompt, you just end up using the current directory, wherever you are. If you are deliberately accepting `<drive letter>:`, then you need to say "On Windows, paths that are just are drive letter pattern are absolute", since the "starting with" includes "C:\foo" for example, which is the same for is_absolute.

I might actually be inclined to change the description to say simply that it follow's GNU's behaviour, and not focus on the differences, but rather state the GNU behaviour:

1. Paths starting with a path separator are absolute.
2. Windows style paths are also absolute if they start with a character followed by ':'.
3. No other paths are absolute.

I'd also put this description in the function comment in the header.



================
Comment at: llvm/lib/Support/Path.cpp:690
 
+bool is_gnu_style_absolute(const Twine &Path, Style Style) {
+  SmallString<128> PathStorage;
----------------
tinti wrote:
> jhenderson wrote:
> > I'd put this function adjacent to `is_absolute`, like it is in the header.
> > 
> > Also, I think you can use the same naming style for variables as in `is_absolute` (i.e. lowerCameCase).
> > 
> > Rather than implement this whole function from scratch, maybe you could just do the bit that's different from `is_absolute` first, and then just call `is_absolute`?
> > I'd put this function adjacent to `is_absolute`, like it is in the header.
> 
> Ok.
> 
> > Also, I think you can use the same naming style for variables as in `is_absolute` (i.e. lowerCameCase).
> 
> Ok. Sorry clang-tidy.
> 
> > Rather than implement this whole function from scratch, maybe you could just do the bit that's different from `is_absolute` first, and then just call `is_absolute`?
> 
> I tried. And I intentionally avoided it. My original implementation called `is_absolute` at the end.
> Now I don't remember if it fails in one of my tests (which I removed because I tested with the data from `TEST(Support, Path)`) or if it is not needed.
> 
> My argument is that It needs to be as readable as possible. I tried to red `is_absolute` and may:
> 
> -- call has_root_directory()
> ---- call root_directory()
> ------ create an iterator()
> ------ call is_separator()
> -- call has_root_name()
> ---- call root_name()
> ------ create an iterator()
> ------ call is_separator() 
> 
> I think is too much complexity to one thing that must be false. After reading the code from GNU it is more clear. The `#ifdef` only sets or unsets `dos_based`.
> 
> See below:
> 
> ```
> #define IS_DIR_SEPARATOR_1(dos_based, c)				\
>   (((c) == '/')								\
>    || (((c) == '\\') && (dos_based)))
> 
> #define HAS_DRIVE_SPEC_1(dos_based, f)			\
>   ((f)[0] && ((f)[1] == ':') && (dos_based))
> 
> #define IS_ABSOLUTE_PATH_1(dos_based, f)		 \
>   (IS_DIR_SEPARATOR_1 (dos_based, (f)[0])		 \
>    || HAS_DRIVE_SPEC_1 (dos_based, f))
> ```
> 
> Comparing with my rules:
> 
> 1) Paths starting with '/';
> 
> ```
> #define IS_DIR_SEPARATOR_1(dos_based, c)				\
>   (((c) == '/')								\
>    || (((c) == '\\') && (dos_based)))
> ```
> 
> 2) On Windows, paths starting with '\\';
> 
> ```
> #define IS_DIR_SEPARATOR_1(dos_based, c)				\
>   (((c) == '/')								\
>    || (((c) == '\\') && (dos_based)))
> ```
> 
> 3) On Windows, paths starting with a drive letter pattern.
> 
> ```
> #define HAS_DRIVE_SPEC_1(dos_based, f)			\
>   ((f)[0] && ((f)[1] == ':') && (dos_based))
> 
> #define IS_ABSOLUTE_PATH_1(dos_based, f)		 \
>   (IS_DIR_SEPARATOR_1 (dos_based, (f)[0])		 \
>    || HAS_DRIVE_SPEC_1 (dos_based, f))
> ```
> 
> Any other condition must be false. I don't thin is worth to call `is_absolute`.
I've been giving this some thought, and looked at the code. I think it is important that we exactly match GNU's behaviour here, since we are specifically saying this function is intended to be GNU compatible. Consequently, I don't think any more that we should call `is_absolute` at all from this function, since that potentially has different behaviour and might accept things we don't want it to. Actually, in the current situation I think it is a strict subset of the behaviour we want, but I can't guarantee that going forward.


================
Comment at: llvm/lib/Support/Path.cpp:695
+  // Handle '/' which should be true regardless Windows or POSIX.
+  if (!P.empty() && is_separator(P.front(), Style::posix))
+    return true;
----------------
tinti wrote:
> I wanted to even avoid calling `is_separator`.
> 
> What do you think?
If we change this to be `is_separator(P.front(), Style)`, won't that mean we don't need the second check below? I feel very confident that we don't need to worry that `is_separator`'s behaviour will change.


================
Comment at: llvm/lib/Support/Path.cpp:704
+    // Handle drive letter pattern (i.e "c:") on Windows.
+    if (P.size() >= 2 && (P[0] && P[1] == ':'))
+      return true;
----------------
tinti wrote:
> rengolin wrote:
> > Neither `7:` nor `$:` are valid paths on Windows, I think.
> > 
> > Perhaps `std::isalpha(P[0])` would help.
> > Neither `7:` nor `$:` are valid paths on Windows, I think.
> > 
> > Perhaps `std::isalpha(P[0])` would help.
> 
> I thought it too but the code on GNU does not check `isalpha`.
> 
> ```
> #define HAS_DRIVE_SPEC_1(dos_based, f)			\
>   ((f)[0] && ((f)[1] == ':') && (dos_based))
> ```
> 
> The Wikipedia also points an extension [1].
> 
> 
> 
> ```
> If access to more filesystems than Z: is required under Windows NT, Volume Mount Points must be used.[11] However, it is possible to mount non-letter drives, such as 1:, 2:, or !: using the command line SUBST utility in Windows XP or later (i.e. SUBST 1: C:\TEMP), but it is not officially supported and may break programs that assume that all drives are letters A: to Z:.
> ```
> 
> There is code on LLDB that does the way you suggested [2].
> 
> [1] https://en.wikipedia.org/wiki/Drive_letter_assignment#Common_assignments
> [2] https://github.com/llvm/llvm-project/blob/bc9b6b33a0d5760370e72ae06c520c25aa8d61c6/lldb/source/Utility/FileSpec.cpp#L315
We should allow what GNU allows, even if it doesn't make much sense. It's unlikely to matter that much, and the code is simpler this way.


================
Comment at: llvm/unittests/Support/Path.cpp:95
+  Paths.emplace_back("foo", false, false);
+  Paths.emplace_back("c:", false, true);
+
----------------
You also need the following cases, I think:
```
/foo    // paths don't just have to be the directory separator
\\foo   // ditto (windows only)
c:\     // ditto (for drive letters)
!:      // drive char can be a non-letter
c       // drive char must be followed by ':'
:       // ':' without drive char is not a drive identifier
xx:     // drive identifier only accepted for single character
c:abc\\ // drive identifier without directory separator immediately following
```


================
Comment at: llvm/unittests/Support/Path.cpp:97
+
+  for (auto &Path : Paths) {
+    EXPECT_EQ(
----------------



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87667/new/

https://reviews.llvm.org/D87667



More information about the llvm-commits mailing list