[LLVMdev] [cfe-dev] Unicode path handling on Windows
Török Edwin
edwintorok at gmail.com
Mon Oct 3 14:18:03 PDT 2011
On 10/03/2011 11:59 PM, Joachim Durchholz wrote:
> Am 03.10.2011 22:12, schrieb Nikola Smiljanic:
>> How about this:
>>
>> for (int i = 0; i != NumWChars; ++i)
>> absPath[i] = std::tolower(absPath[i], std::locale());
>>
>> seems to be working just fine?
>
> You have two assumptions here:
>
> Assumption 1: For each lowercase character, there is an equivalent
> uppercase character, and vice versa.
> This is not true in half a dozen languages according to
> ftp://ftp.unicode.org/Public/UNIDATA/SpecialCasing.txt .
>
> Assumption 2: The transformation from lower case to upper case can be
> done for each character individually, without considering context.
> This is not true in a couple of languages according to SpecialCasing.txt.
>
> Do not do that. If you get complaints, they will be about scripts that
> you can't type on your keyboard, and that you know nothing about so you
> don't even know what the right behaviour would have been.
> Rely on the relevant Unicode library. Which one that would be, and which
> functions to call, depends on what you need that to-lowercase
> transformation for. (It also depends on whether the names you get are
> already normalized or not; I'd want to run a normalization pass on the
> names first just to be on the safe side.)
Does Windows do proper Unicode to-lowercase, or does it just lowercase A-Z?
>From reading the below article I get that you can create filenames that would be considered
identical under Unicode to-lowercase rules, but yet they exist as different files:
https://blogs.msdn.com/b/michkap/archive/2005/10/17/481600.aspx
Best regards,
--Edwin
More information about the llvm-dev
mailing list