[LLVMdev] [cfe-dev] Unicode path handling on Windows

Nikola Smiljanic popizdeh at gmail.com
Fri Sep 23 09:57:43 PDT 2011


Here's a new patch that fixes all the issues mentioned before. Note that
this isn't final, I didn't want to replace all calls to ::stat so that it's
easier to review.

I have only one more questions:

1. _wopen accepts _stat64i32 instead of stat structure. These two are
exactly the same, the only difference is that stat uses time_t
and _stat64i32 uses __time64_t (time_t is a typedef for __time64_t but this
depends on one more macro). I check to see that size of these two structures
is the same, but is there something more that I could do? I'd also like to
use some kind of static assert instead of runtime assert.

I've also noticed that function LowercaseString (the same applies to
UppercaseString) doesn't work with UTF8 input. It's calling isupper and
tolower and these functions can't handle UTF8. This should get a new issue
probably?

On Wed, Sep 21, 2011 at 1:41 AM, Aaron Ballman <aaron at aaronballman.com>wrote:

> On Tue, Sep 20, 2011 at 4:15 PM, Nikola Smiljanic <popizdeh at gmail.com>
> wrote:
> > OK since this approach makes sense I'll shoot with my questions :)
> > 1. Where should get_utf8_argv go and is the name of this function OK?
> Right
> > now the function is inside llvm::sys::fs namespace because I need access
> to
> > Windows.h, should I leave it there.
>
> I don't think it belongs there as it has nothing to do with the file
> system.  However, I'm not familiar enough to know of a better place
> for it either.  Hopefully someone else can chime in with an idea.
>
> > 2. get_utf8_argv allocates new char** representing utf8 argv, should I
> > deallocate this inside driver.cpp, use std::vector<std::string> instead
> or
> > ignore the allocation completely?
>
> Perhaps some SmallVectors would make sense here?  An inner one to hold
> the arg data, and an outer one to hold the list of inner vectors.
> Then you can do the cleanup within the call.
>
> > 3. There is an #ifdef inside driver.cpp right now that I don't like.
> > get_utf8_argv works for windows only, but I could change the function to
> > accept argv and return it as vector of utf8 strings. It would convert
> char*
> > to std::string on unix and use GetCommandLine + utf8 conversion on
> windows.
>
> I have no concrete opinion one way or the other on this.
>
> > 4. Should I wrap functions like ::fstat and ::close for consistency even
> > though they don't work with paths?
>
> I don't believe so.
>
> > I'll fix everything that is formatting related. You're right, current
> patch
> > has Open only for windows but I'll add the other one as well.
>
> Thanks for the help!
>
> ~Aaron
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110923/ca69a292/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: clang.patch
Type: application/octet-stream
Size: 2569 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110923/ca69a292/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: llvm.patch
Type: application/octet-stream
Size: 4685 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110923/ca69a292/attachment-0001.obj>


More information about the llvm-dev mailing list