[cfe-dev] Unicode path handling on Windows

Nikola Smiljanic popizdeh at gmail.com
Wed Aug 31 10:58:52 PDT 2011


_wopen expects wchar_t* and the only visible function for conversion to
utf16 is ConvertUTF8toUTF32 which converts to unsigned shorts. There is a
function that does exactly what I need called UTF8ToUTF16, but it's inside
an anonymous namespace inside windows version of PathV2.inc

I could solve this in a number of ways, but I'm not sure which one is
preferred inside Clang codebase?

On Thu, Aug 25, 2011 at 1:25 AM, Ruben Van Boxem
<vanboxem.ruben at gmail.com>wrote:

> Op 25 aug. 2011 00:08 schreef "Nikola Smiljanic" <popizdeh at gmail.com> het
> volgende:
>
> >
> > I'm trying to fix unicode file handling on windows
> http://llvm.org/bugs/show_bug.cgi?id=10348. This currently doesn't work
> because argv is encoded as multibyte string (clang project is configured
> this way).
> >
> > Michael suggested converting command line to utf8, and this indeed solves
> the error that the driver emits, but there is another check in
> CompilerInstance that fails because FileSystemStatCache::get calls ::open
> and I'm guessing that this function is not smart enough to handle utf8 path
> on windows? Any ideas?
>
> It's not smart enough no, but you can use _wfopen instead. Note that all of
> its arguments are wchar_t*
>
> Ruben
>
> >
> > I have one more question. I added MultibyteToUTF8 function to PathV2.inc
> (windows version) and now I'd like to call it from ExpandArgv (driver.cpp)
> but this code is platform specific and isn't visible (function is
> inside anonymous namespace). I could create a wrapper function that calls
> this function on windows and does nothing on other platforms. Is this the
> way to go, and where should I put it (llvm::sys::fs, llvm::sys::path or
> somewhere else)?
> >
> > ---------- Forwarded message ----------
> > From: Michael Spencer <bigcheesegs at gmail.com>
> > Date: Sat, Jul 16, 2011 at 12:32 AM
> > Subject: Re: Question regarding Clang path handling on Windows
> > To: Nikola Smiljanic <popizdeh at gmail.com>
> >
> >
> > On Thu, Jul 14, 2011 at 8:25 AM, Nikola Smiljanic <popizdeh at gmail.com>
> wrote:
> > > Hi Michael I'd like to fix this bug if I
> > > can http://llvm.org/bugs/show_bug.cgi?id=10348. Started looking around
> and I
> > > think I know where the problem is (your name showed up in svn log for
> > > PathV2.inc), but I'm not sure what is the right way to solve it.
> Namely, the
> > > check in Driver::BuildActions (line 772) fails. It seems
> > > that function llvm::sys::fs::exists tries to convert input string from
> utf8
> > > to utf16, but clang.exe  is compiled with Multibyte Character Sets.
> This
> > > means that the conversion will succeed when you pass an ANSI string
> that is
> > > also a valid utf8 string. But if you try to pass in
> some Chinese characters
> > > you'll get a single byte character string that is interpreted using the
> > > current windows locale, and in this case conversion from utf8 to utf16
> will
> > > fail (character values are negative). So my question is whether this
> > > function should do the conversion at all, maybe there are other places
> in
> > > the code that can call it with utf8 input that is obtained from some
> windows
> > > function? In this particular case, conversion should be done using the
> > > current locale. I'd like to hear what you think?
> >
> > This was an oversight on my part. I assumed the command line would be
> > in utf8 for some reason. Clang internals currently assume utf8, so the
> > correct fix is to convert the command line to utf8 first. I'll look
> > into adding this.
> >
> > - Michael Spencer
> >
> >
> >
> > _______________________________________________
> > cfe-dev mailing list
> > cfe-dev at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20110831/ff71ae3d/attachment.html>


More information about the cfe-dev mailing list