[cfe-dev] Unicode path handling on Windows

Nikola Smiljanic popizdeh at gmail.com
Wed Aug 24 15:05:46 PDT 2011


I'm trying to fix unicode file handling on windows
http://llvm.org/bugs/show_bug.cgi?id=10348. This currently doesn't work
because argv is encoded as multibyte string (clang project is configured
this way).

Michael suggested converting command line to utf8, and this indeed solves
the error that the driver emits, but there is another check in
CompilerInstance that fails because FileSystemStatCache::get calls ::open
and I'm guessing that this function is not smart enough to handle utf8 path
on windows? Any ideas?

I have one more question. I added MultibyteToUTF8 function to PathV2.inc
(windows version) and now I'd like to call it from ExpandArgv (driver.cpp)
but this code is platform specific and isn't visible (function is
inside anonymous namespace). I could create a wrapper function that calls
this function on windows and does nothing on other platforms. Is this the
way to go, and where should I put it (llvm::sys::fs, llvm::sys::path or
somewhere else)?

---------- Forwarded message ----------
From: Michael Spencer <bigcheesegs at gmail.com>
Date: Sat, Jul 16, 2011 at 12:32 AM
Subject: Re: Question regarding Clang path handling on Windows
To: Nikola Smiljanic <popizdeh at gmail.com>


 On Thu, Jul 14, 2011 at 8:25 AM, Nikola Smiljanic <popizdeh at gmail.com>
wrote:
> Hi Michael I'd like to fix this bug if I
> can http://llvm.org/bugs/show_bug.cgi?id=10348. Started looking around and
I
> think I know where the problem is (your name showed up in svn log for
> PathV2.inc), but I'm not sure what is the right way to solve it. Namely,
the
> check in Driver::BuildActions (line 772) fails. It seems
> that function llvm::sys::fs::exists tries to convert input string from
utf8
> to utf16, but clang.exe  is compiled with Multibyte Character Sets. This
> means that the conversion will succeed when you pass an ANSI string that
is
> also a valid utf8 string. But if you try to pass in
some Chinese characters
> you'll get a single byte character string that is interpreted using the
> current windows locale, and in this case conversion from utf8 to utf16
will
> fail (character values are negative). So my question is whether this
> function should do the conversion at all, maybe there are other places in
> the code that can call it with utf8 input that is obtained from some
windows
> function? In this particular case, conversion should be done using the
> current locale. I'd like to hear what you think?

This was an oversight on my part. I assumed the command line would be
in utf8 for some reason. Clang internals currently assume utf8, so the
correct fix is to convert the command line to utf8 first. I'll look
into adding this.

- Michael Spencer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20110825/755b5510/attachment.html>


More information about the cfe-dev mailing list