[PATCH] D87732: [Support] Provide sys::path::guess_style

Reid Kleckner via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Sep 28 11:51:24 PDT 2020


rnk added a comment.

In D87732#2296149 <https://reviews.llvm.org/D87732#2296149>, @amccarth wrote:

> In D87732#2296076 <https://reviews.llvm.org/D87732#2296076>, @rnk wrote:
>
>> Separately, I would like LLVM to move in the direction of standardizing on forward slashes in internal representations and data structures.
>
> If we move that direction, we'll have to re-think VFS again.  I'll also lobby hard to make sure that we show the native representation in compiler output, like diagnostic messages.

Yes, we might have to adjust VFS usage, but most of the important users are in-tree, so this is doable, not something unfixable, like external libraries that won't accept forward slashes. And yes, I think diagnostics seem like a reasonable place to canonicalize path names for the user to whatever their preference is, but as Martin points out, the user's preference may be different depending on the environment.

>> We already convert long paths to UNC style before we call FS APIs on Windows. That's a good point to rewrite to backslashes if we need to. I don't believe any information or capabilities are lost: NTFS filenames may not contain `/` characters. Even if we receive a funky UNC-style path from a user, we can change the slash direction internally, and no information will be lost.
>
> Technical nit:  The `\\?\` prefix to bypass path parsing and access the NTFS namespace directly looks similar to a UNC path, but it's not a UNC path.
>
> I believe NTFS paths can contain almost any Unicode character, including ones the Win32 conventions choose to treat as reserved.  It simply becomes difficult to work with such outliers, but you can do almost anything with CreateFile and the `\\?\` escape prefix.  You can even name a file `..`.

This is similar to the way in which NTFS file names are not really UTF-16, they are UCS-2, and you can jam pretty much any 16-bit integer (maybe not 0?) into the file name if you like. It may not be possible to re-encode all filenames as UTF-8, but LLVM does it anyway, and nobody has complained about it yet, to my knowledge.

> The Win32 namespace is built on top of the NT namespace, using symlinks to map Win32 naming conventions to devices, and some NT namespace device nodes are simply parents for the underlying filesystem (like NTFS).
>
> Is `x` a relative file name in Windows?  It could be.  But Windows APIs will treat it like a drive letter because it's only one character.  If the current path for the X: drive is the root directory, then `x` is effectively an absolute path.  That's why, to access a single-letter file name, you often have to type `.\x` or `x.` or even `x::$DATA`.  It's a real mess.  Similar surprises (and workarounds) await those who name their file like a DOS-style device name, like `AUX`, `CON`, `PR`, `NUL` or `COM1`.

Maybe this is my non-Windows native-ness showing, but I think in all these cases, it would be preferable to treat all of these as vanilla, relative filenames, with the possible exception of NUL. I'm not sure I want `clang ... -o COM1` to try to write to a DOS device. It limits LLVM tools to the common denominator of FS operations shared between Posix and Windows, but maybe we can accept those limitations.

> My point is, there are a lot pitfalls here.  Some we handle, many we don't.  But the more we try to canonicalize paths into a non-native style, the more traps we might encounter and the harder it will be to fix them.

That's true, but I think we stand to gain a lot by making it easier to write LLVM code and tests that are portable by default. Consider our UTF-8 internal string representation: I consider this decision to have resulted in a huge win for simplicity. LLVM may have too many string types, but we'd have twice as many if we had Unicode variants of them all. :)


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87732/new/

https://reviews.llvm.org/D87732



More information about the llvm-commits mailing list