[LLVMdev] request for windows unicode support

Michael Spencer bigcheesegs at gmail.com
Fri Nov 26 08:47:35 PST 2010


2010/11/26 Jochen Wilhelmy <j.wilhelmy at arcor.de>:
>
>> Can't you just store filenames as UTF8 (like you do on Linux), and
>> convert UTF8 to widechar just when calling the windows APIs?
>> Same for converting back directory listings as such, you get widechar,
>> and convert back to UTF8.
>> All you would need to do is implement that conversion in System/Win32,
>> I think MultiByteToWideChar supports UTF8, doesn't it?
>>
>
> I would think the most efficient approach is to use utf16 (i.e. wchar_t)
> internally on windows
> (ohterwise utf8). Then if a path is used multiple times no conversion takes
> place. The conversion only
> takes place at creation time when you create a path from utf8.

The current API is stateless, meaning that the user is responsible for
the storage and format of paths. Thus there is no internal storage.
However, we could cache the conversion using a thread local limited
size LRU cache depending on how long the conversion takes. Storing
string as utf-16 would require changing them to utf-8 whenever the
client wanted to look at them, incurring lots of memory allocations
and copying.

> even if you have reasons not to use it, you should have a look at
> www.boost.org/doc/libs/1_45_0/libs/filesystem/v3/doc/index.htm
> www.boost.org/doc/libs/1_45_0/libs/filesystem/v3/doc/v3_design.html
>
> -Jochen

My design is based exactly off of that.

- Michael Spencer



More information about the llvm-dev mailing list