[LLVMdev] [cfe-dev] Unicode path handling on Windows
Joachim Durchholz
jo at durchholz.org
Mon Oct 3 13:59:58 PDT 2011
Am 03.10.2011 22:12, schrieb Nikola Smiljanic:
> How about this:
>
> for (int i = 0; i != NumWChars; ++i)
> absPath[i] = std::tolower(absPath[i], std::locale());
>
> seems to be working just fine?
You have two assumptions here:
Assumption 1: For each lowercase character, there is an equivalent
uppercase character, and vice versa.
This is not true in half a dozen languages according to
ftp://ftp.unicode.org/Public/UNIDATA/SpecialCasing.txt .
Assumption 2: The transformation from lower case to upper case can be
done for each character individually, without considering context.
This is not true in a couple of languages according to SpecialCasing.txt.
Do not do that. If you get complaints, they will be about scripts that
you can't type on your keyboard, and that you know nothing about so you
don't even know what the right behaviour would have been.
Rely on the relevant Unicode library. Which one that would be, and which
functions to call, depends on what you need that to-lowercase
transformation for. (It also depends on whether the names you get are
already normalized or not; I'd want to run a normalization pass on the
names first just to be on the safe side.)
Regards,
Jo
More information about the llvm-dev
mailing list