[cfe-dev] Unicode path handling on Windows

Ruben Van Boxem vanboxem.ruben at gmail.com
Thu Sep 1 15:04:51 PDT 2011


Op 1 sep. 2011 23:19 schreef "Seth Cantrell" <seth.cantrell at gmail.com> het
volgende:
>
> One issue is that filenames on Windows can include Unicode characters not
supported by the current code page, so the filenames in const char *argv[]
aren't necessarily usable. The solution is to avoid argv and instead use the
Windows API:
>
> #include <ShellAPI.h> // for CommandLineToArgvW
> #include <iostream>
> #include <string>
> #include <vector>
>
> int main() {
> #ifdef WIN32
>
> // get UTF-16 encoded wchar_t arguments
>
>     LPWSTR *szArglist;
>     int argc;
>     szArglist = CommandLineToArgvW(GetCommandLineW(),&argc);
>     if(NULL==szArglist) {
>        std::cerr << "CommandLineToArgvW failed\n";
>     }
>
> // convert to UTF-8 encoded char arguments (C++11)
>
>     std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>,wchar_t>
convert;
>     std::vector<std::string> args;
>     for(int i=0;i<argc;++i) {
>         args.push_back(convert.to_bytes(szArglist[i]));
>     }
> #endif //ifdef WIN32
>
> }

Windows has an API for that: check for an almost ready-made solution here:
http://stackoverflow.com/questions/2181205/utf-8-to-from-utf-16-problem(read
the answer as well for the proper function call, I don't have access
to my own implementation right now). No need for fancy c++11 here.

>
>
>
>
> On Sep 1, 2011, at 4:17 PM, Nikola Smiljanic wrote:
>
>> AFAIK Clang internals do assume utf8, and llvm::sys::path converts
strings to utf16 on windows and calls W API functions.
>>
>> If somebody would like to take a look at my changes and comment on them.
Here's a brief explanation of what I did:
>>
>> - Convert argv to utf8 using current system locale for win32 (this is
done as soon as possible inside ExpandArgv). This makes the driver happy
since calls to llvm::sys::path::exists succeed.
>> - Change calls to ::open (inside FileSystemStatCache and MemoryBuffer) to
::_wopen on win32 by converting the path to utf16.
>> - In order to do the conversions I had to expose two functions, one of
them was already there but wasn't visible, the other one was added by me
>>
>> Known issues:
>>
>> - I should probably use LLVM_ON_WIN32 instead of WIN32 but this macro
isn't defined inside FileSystemStatCache and MemoryBuffer for some reason.
Both of these files have an #ifdef section that deals with O_BINARY so maybe
these two sections should be consolidated?
>> - Functions convert_multibyte_to_utf8 and convert_utf8_to_utf16 have
definitions only on windows so every other platform is currently broken.
>>
>> On Thu, Sep 1, 2011 at 5:44 PM, Ruben Van Boxem <vanboxem.ruben at gmail.com>
wrote:
>>>
>>> Isn't it more straightforward to use utf-8 internally and use the
conversion functions provided by the win32 API when calling other win32 API
functions, and always call the wide versions of the win32 functions. Full
compatibility guaranteed, and one encoding internally.
>>>
>>> Ruben
>>
>>
<unicode_path_clang.patch><unicode_path_llvm.patch>_______________________________________________
>>
>> cfe-dev mailing list
>> cfe-dev at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20110902/3c5528a0/attachment.html>


More information about the cfe-dev mailing list