[LLVMdev] [cfe-dev] Unicode path handling on Windows

Nikola Smiljanic popizdeh at gmail.com
Tue Sep 20 09:52:36 PDT 2011


I spent some more time on this. My idea was to use functionality from
llvm::sys::fs like file_status instead of stat struct, but as it turns out
this is not really possible. file_status structure is not a replacement for
stat, nor are there functions inside llvm::sys::fs that can replace calls to
::stat and ::open.

The only solution I can see is to create wrappers around ::stat, ::open,
etc. and use them in the codebase instead of direct calls. Unix
implementation would only forward calls to the right function. Windows
implementation would convert path from utf8 to utf16 and call the
appropriate function (::_wopen instead of ::open and so on). Is this
solution acceptable?

Here's a patch where I tried this (it has a number of issues but I'd like to
hear from someone that this solution makes sense before I try to address
them).

On Wed, Sep 7, 2011 at 9:39 AM, Eli Friedman <eli.friedman at gmail.com> wrote:

> On Wed, Sep 7, 2011 at 12:20 AM, Nikola Smiljanic <popizdeh at gmail.com>
> wrote:
> > So what are you exactly saying? Somebody proposed using GetCommandLine
> > instead of using argv directly. And what about my other points about
> ::open
> > and ::stat?
>
> Because of the way Windows works, the only fully correct solution is
> to never touch the native charset, and keep everything in UTF-8/16 all
> the time.  Anything else is going to lead to bugs like "clang fails to
> find my file in a directory whose name has a Greek letter in it".  If
> we're going to be spending time messing with this code, we should take
> the time to get it right.  If it requires some extra refactoring to
> make that work, that's acceptable...
>
> -Eli
>
> > On Wed, Sep 7, 2011 at 8:52 AM, Eli Friedman <eli.friedman at gmail.com>
> wrote:
> >>
> >> On Tue, Sep 6, 2011 at 11:28 PM, Nikola Smiljanic <popizdeh at gmail.com>
> >> wrote:
> >> > The problem is not in the functions that return multibyte strings (the
> >> > multibyte string is coming from argv)
> >>
> >> argv is implicitly the return from a UTF16->multibyte conversion (i.e.
> >> it's lossy).
> >>
> >> -Eli
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110920/711ea316/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: clang.patch
Type: application/octet-stream
Size: 2100 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110920/711ea316/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: llvm.patch
Type: application/octet-stream
Size: 2670 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110920/711ea316/attachment-0001.obj>


More information about the llvm-dev mailing list