[LLVMdev] [cfe-dev] Unicode path handling on Windows

Bryce Cogswell bryceco at yahoo.com
Tue Sep 6 17:22:25 PDT 2011


As was mentioned once before, the correct solution is to never use multibyte anywhere. Any Windows functions that currently return multibyte strings should be converted to their wide-string (unicode) equivalent, with the result converted to UTF-8.


> From: Nikola Smiljanic <popizdeh at gmail.com>
>
> I think I got it this time. I realized that ::open and ::stat work just fine with multibyte paths on windows so there's no need to change this code. The only problem is llvm::sys::fs module which falsely assumes that input strings are UTF8 encoded when they are in fact multibyte strings.
>
> Now I really hope I haven't broken anything because llvm::sys::fs::exists is called in a number of places, but I'm guessing that none of the paths that are passed to it are really UTF8?
>
> I think entire llvm::sys::fs module should be changed to use MultibyteToUTF16 instead of UTF8ToUTF16 before calling windows api functions (unless somebody knows that we actually have UTF8 paths on windows somewhere in the code)? 





More information about the llvm-dev mailing list