[LLVMdev] [cfe-dev] Unicode path handling on Windows
popizdeh at gmail.com
Wed Sep 28 22:11:01 PDT 2011
I have a problem with Lowercase and Uppercase functions. These are broken on
all platforms, not only Window, so I can't just #ifdef and use CharLowerBuffW.
I need a portable way to convert from UTF8 to UTF16. There is set of
functions inside clang/Basic/ConvertUTF, but LLVM can't depend on this. What
do I do?
On Tue, Sep 27, 2011 at 5:09 AM, Bryce Cogswell <bryceco at yahoo.com> wrote:
> I think the assert you have for _stat64i32 is fine. It is a constant
> expression so should compile to nothing, and the chance of the definition
> changing is pretty much zero.
> LowercaseString appears to be used by ASM printers where the output is
> expected to be ASCII, and then some WIN32-conditioned code in
> FileManager.cpp. I hate to say it but you'll probably need to convert the
> UTF-8 paths to wide char, lower case it using CharLowerBuffW (since it needs
> to match the casing rules used by NTFS), and then convert back to UTF-8.
> If you need to be pedantic about recognizing whether two paths are the same
> on Windows you also need to call GetFullPathName in order to expand any 8.3
> path components, but this is an expensive function so I wouldn't do it
> unless absolutely necessary.
> There is also a call to _fullpath in there that needs to be changed to
> The rest of the patch looks good.
> On Sep 23, 2011, at 9:57 AM, Nikola Smiljanic wrote:
> Here's a new patch that fixes all the issues mentioned before. Note that
> this isn't final, I didn't want to replace all calls to ::stat so that it's
> easier to review.
> I have only one more questions:
> 1. _wopen accepts _stat64i32 instead of stat structure. These two are
> exactly the same, the only difference is that stat uses time_t
> and _stat64i32 uses __time64_t (time_t is a typedef for __time64_t but this
> depends on one more macro). I check to see that size of these two structures
> is the same, but is there something more that I could do? I'd also like to
> use some kind of static assert instead of runtime assert.
> I've also noticed that function LowercaseString (the same applies to
> UppercaseString) doesn't work with UTF8 input. It's calling isupper and
> tolower and these functions can't handle UTF8. This should get a new issue
> On Wed, Sep 21, 2011 at 1:41 AM, Aaron Ballman <aaron at aaronballman.com>wrote:
>> On Tue, Sep 20, 2011 at 4:15 PM, Nikola Smiljanic <popizdeh at gmail.com>
>> > OK since this approach makes sense I'll shoot with my questions :)
>> > 1. Where should get_utf8_argv go and is the name of this function OK?
>> > now the function is inside llvm::sys::fs namespace because I need access
>> > Windows.h, should I leave it there.
>> I don't think it belongs there as it has nothing to do with the file
>> system. However, I'm not familiar enough to know of a better place
>> for it either. Hopefully someone else can chime in with an idea.
>> > 2. get_utf8_argv allocates new char** representing utf8 argv, should I
>> > deallocate this inside driver.cpp, use std::vector<std::string> instead
>> > ignore the allocation completely?
>> Perhaps some SmallVectors would make sense here? An inner one to hold
>> the arg data, and an outer one to hold the list of inner vectors.
>> Then you can do the cleanup within the call.
>> > 3. There is an #ifdef inside driver.cpp right now that I don't like.
>> > get_utf8_argv works for windows only, but I could change the function to
>> > accept argv and return it as vector of utf8 strings. It would convert
>> > to std::string on unix and use GetCommandLine + utf8 conversion on
>> I have no concrete opinion one way or the other on this.
>> > 4. Should I wrap functions like ::fstat and ::close for consistency even
>> > though they don't work with paths?
>> I don't believe so.
>> > I'll fix everything that is formatting related. You're right, current
>> > has Open only for windows but I'll add the other one as well.
>> Thanks for the help!
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev