[PATCH] D87732: [Support] Provide sys::path::guess_style

Adrian McCarthy via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Sep 25 17:25:21 PDT 2020


amccarth added a comment.

In D87732#2296076 <https://reviews.llvm.org/D87732#2296076>, @rnk wrote:

> Separately, I would like LLVM to move in the direction of standardizing on forward slashes in internal representations and data structures.

If we move that direction, we'll have to re-think VFS again.  I'll also lobby hard to make sure that we show the native representation in compiler output, like diagnostic messages.

> We already convert long paths to UNC style before we call FS APIs on Windows. That's a good point to rewrite to backslashes if we need to. I don't believe any information or capabilities are lost: NTFS filenames may not contain `/` characters. Even if we receive a funky UNC-style path from a user, we can change the slash direction internally, and no information will be lost.

Technical nit:  The `\\?\` prefix to bypass path parsing and access the NTFS namespace directly looks similar to a UNC path, but it's not a UNC path.

I believe NTFS paths can contain almost any Unicode character, including ones the Win32 conventions choose to treat as reserved.  It simply becomes difficult to work with such outliers, but you can do almost anything with CreateFile and the `\\?\` escape prefix.  You can even name a file `..`.

The Win32 namespace is built on top of the NT namespace, using symlinks to map Win32 naming conventions to devices, and some NT namespace device nodes are simply parents for the underlying filesystem (like NTFS).

Is `x` a relative file name in Windows?  It could be.  But Windows APIs will treat it like a drive letter because it's only one character.  If the current path for the X: drive is the root directory, then `x` is effectively an absolute path.  That's why, to access a single-letter file name, you often have to type `.\x` or `x.` or even `x::$DATA`.  It's a real mess.  Similar surprises (and workarounds) await those who name their file like a DOS-style device name, like `AUX`, `CON`, `PR`, `NUL` or `COM1`.

My point is, there are a lot pitfalls here.  Some we handle, many we don't.  But the more we try to canonicalize paths into a non-native style, the more traps we might encounter and the harder it will be to fix them.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87732/new/

https://reviews.llvm.org/D87732



More information about the llvm-commits mailing list