[llvm-dev] RFC: Support for preferring paths with forward slashes on Windows

Martin Storsjö via llvm-dev llvm-dev at lists.llvm.org
Thu Oct 14 13:53:48 PDT 2021


On Thu, 14 Oct 2021, Chris Tetreault wrote:

> I could be mistaken, but I believe that since the dawn of time, Windows 
> has just secretly supported forward slashes. A quick google search does 
> not turn up any Microsoft docs stating that this is true, but I've heard 
> rumors that it's been this way since DOS. On my Windows 10 machine, 
> Powershell accepts /, cmd.exe accepts /, and Visual Studio accepts /.

Yes, overall most APIs that take paths can take either form, but in my 
experience, cmd.exe pretty exclusively requires backslashes.

> Whomever takes it upon themselves to work on this should test 
> extensively before committing code. I would probably feel better if 
> somebody could dig up some authoritative source on this.

I don't think this aspect is anything new/controversial wrt LLVM so far; 
it can take paths that use forward slashes (if given such paths) in a 
number of places and pass them through pretty much as-is to the underlying 
APIs.

But most cases where we take a path and feed to a underlying API is well 
centralised to a single function, which takes our char based UTF8 paths 
and widens them to UTF16 wchar_t, before passing them to the actual Win32 
APIs in that form. Currently, that function forces the paths to backslash 
form in certain cases (when it needs to prepend a \\?\ prefix for long 
paths), but if we felt weary about it we could make it always force them 
to backslash form.

So the fact that we can pass paths with forward slashes to Win32 APIs is a 
preexisting condition and nothing that my patch set would change, 
essentially - it'd just do it more often than before.

> Assuming that this is the case, it would probably be nice if any paths 
> we take in were just immediately canonicalized to use / and all paths 
> just have forward slash. I know we have a ton of tests that have this 
> `{(/|\\)}` regex in them, and it would be nice if we could just not do 
> that.

If desired, that could be a later goal - that's a couple steps further 
than what I aimed for so far though.

Right now, my patchset canonicalizes paths that are made up internally 
(functions like current_path(), getMainExecutable(), findProgramByName(), 
and how InitLLVM() sets argv[0]) and uses the preferred separator wherever 
paths are assembled in code, but in many cases, paths are taken in and 
passed around in the user-provided form too. Given the full interface of 
e.g. Clang, there's a huge number of different places where paths can be 
provided (there's dozens of various command line options that take paths 
as arguments).

Also, judging from both GCC and MSVC, neither of them seem to canonicalize 
paths on input. If I call either of them with e.g. c:\dir\source.c or 
c:/dir/source.c, then the warnings emitted from that file are printed with 
slashes in the exact form I input.

But in any case, regardless of how far we want to go with 
canonicalizations in either form, the patchset I've started on, given that 
others agree on the design, is a first step towards being able to use 
forward slashes. It works quite well to apply it gradually until switching 
the preference.

// Martin



More information about the llvm-dev mailing list