[cfe-dev] [LLVMdev] Unicode path handling on Windows

Nikola Smiljanic popizdeh at gmail.com
Wed Sep 28 22:11:01 PDT 2011


I have a problem with Lowercase and Uppercase functions. These are broken on
all platforms, not only Window, so I can't just #ifdef and use CharLowerBuffW.
I need a portable way to convert from UTF8 to UTF16. There is set of
functions inside clang/Basic/ConvertUTF, but LLVM can't depend on this. What
do I do?

On Tue, Sep 27, 2011 at 5:09 AM, Bryce Cogswell <bryceco at yahoo.com> wrote:

> I think the assert you have for _stat64i32 is fine. It is a constant
> expression so should compile to nothing, and the chance of the definition
> changing is pretty much zero.
>
> LowercaseString appears to be used by ASM printers where the output is
> expected to be ASCII, and then some WIN32-conditioned code in
> FileManager.cpp. I hate to say it but you'll probably need to convert the
> UTF-8 paths to wide char, lower case it using CharLowerBuffW (since it needs
> to match the casing rules used by NTFS), and then convert back to UTF-8.
>
> If you need to be pedantic about recognizing whether two paths are the same
> on Windows you also need to call GetFullPathName in order to expand any 8.3
> path components, but this is an expensive function so I wouldn't do it
> unless absolutely necessary.
>
> There is also a call to _fullpath in there that needs to be changed to
> _wfullpath.
>
> The rest of the patch looks good.
>
> On Sep 23, 2011, at 9:57 AM, Nikola Smiljanic wrote:
>
> Here's a new patch that fixes all the issues mentioned before. Note that
> this isn't final, I didn't want to replace all calls to ::stat so that it's
> easier to review.
>
> I have only one more questions:
>
> 1. _wopen accepts _stat64i32 instead of stat structure. These two are
> exactly the same, the only difference is that stat uses time_t
> and _stat64i32 uses __time64_t (time_t is a typedef for __time64_t but this
> depends on one more macro). I check to see that size of these two structures
> is the same, but is there something more that I could do? I'd also like to
> use some kind of static assert instead of runtime assert.
>
> I've also noticed that function LowercaseString (the same applies to
> UppercaseString) doesn't work with UTF8 input. It's calling isupper and
> tolower and these functions can't handle UTF8. This should get a new issue
> probably?
>
> On Wed, Sep 21, 2011 at 1:41 AM, Aaron Ballman <aaron at aaronballman.com>wrote:
>
>> On Tue, Sep 20, 2011 at 4:15 PM, Nikola Smiljanic <popizdeh at gmail.com>
>> wrote:
>> > OK since this approach makes sense I'll shoot with my questions :)
>> > 1. Where should get_utf8_argv go and is the name of this function OK?
>> Right
>> > now the function is inside llvm::sys::fs namespace because I need access
>> to
>> > Windows.h, should I leave it there.
>>
>> I don't think it belongs there as it has nothing to do with the file
>> system.  However, I'm not familiar enough to know of a better place
>> for it either.  Hopefully someone else can chime in with an idea.
>>
>> > 2. get_utf8_argv allocates new char** representing utf8 argv, should I
>> > deallocate this inside driver.cpp, use std::vector<std::string> instead
>> or
>> > ignore the allocation completely?
>>
>> Perhaps some SmallVectors would make sense here?  An inner one to hold
>> the arg data, and an outer one to hold the list of inner vectors.
>> Then you can do the cleanup within the call.
>>
>> > 3. There is an #ifdef inside driver.cpp right now that I don't like.
>> > get_utf8_argv works for windows only, but I could change the function to
>> > accept argv and return it as vector of utf8 strings. It would convert
>> char*
>> > to std::string on unix and use GetCommandLine + utf8 conversion on
>> windows.
>>
>> I have no concrete opinion one way or the other on this.
>>
>> > 4. Should I wrap functions like ::fstat and ::close for consistency even
>> > though they don't work with paths?
>>
>> I don't believe so.
>>
>> > I'll fix everything that is formatting related. You're right, current
>> patch
>> > has Open only for windows but I'll add the other one as well.
>>
>> Thanks for the help!
>>
>> ~Aaron
>>
>
> <clang.patch><llvm.patch>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20110929/34cc798b/attachment.html>


More information about the cfe-dev mailing list