[LLVMdev] [cfe-dev] Unicode path handling on Windows

Joachim Durchholz jo at durchholz.org
Mon Oct 3 13:59:58 PDT 2011


Am 03.10.2011 22:12, schrieb Nikola Smiljanic:
> How about this:
>
> for (int i = 0; i != NumWChars; ++i)
>          absPath[i] = std::tolower(absPath[i], std::locale());
>
> seems to be working just fine?

You have two assumptions here:

Assumption 1: For each lowercase character, there is an equivalent 
uppercase character, and vice versa.
This is not true in half a dozen languages according to
ftp://ftp.unicode.org/Public/UNIDATA/SpecialCasing.txt .

Assumption 2: The transformation from lower case to upper case can be 
done for each character individually, without considering context.
This is not true in a couple of languages according to SpecialCasing.txt.

Do not do that. If you get complaints, they will be about scripts that 
you can't type on your keyboard, and that you know nothing about so you 
don't even know what the right behaviour would have been.
Rely on the relevant Unicode library. Which one that would be, and which 
functions to call, depends on what you need that to-lowercase 
transformation for. (It also depends on whether the names you get are 
already normalized or not; I'd want to run a normalization pass on the 
names first just to be on the safe side.)

Regards,
Jo



More information about the llvm-dev mailing list