[libcxx-commits] [PATCH] D120796: [libcxx] Fix the ctype `is` (pointer version) function for Windows

Martin Storsjö via Phabricator via libcxx-commits libcxx-commits at lists.llvm.org
Wed Mar 2 14:24:01 PST 2022


mstorsjo added inline comments.


================
Comment at: libcxx/test/std/localization/locale.categories/category.ctype/locale.ctype.byname/is_1.pass.cpp:110-112
+            // On Windows, in the "C" locale, \x00DA is interpreted according
+            // to the current code page, where this char can be considered an
+            // uppercase alphabetical character.
----------------
mstorsjo wrote:
> Quuxplusone wrote:
> > mstorsjo wrote:
> > > Quuxplusone wrote:
> > > > This comment is too vague for my liking. I infer that "can be" means sometimes it is and sometimes it isn't. I'd rather see something very specific, like
> > > > ```
> > > > // On Windows, this depends on the current codepage.
> > > > // If the codepage is CP-437, \x00DA is U+250C BOX DRAWINGS LIGHT DOWN AND RIGHT.
> > > > // If the codepage is CP-1252, \x00DA is U+00DA CAPITAL LETTER U WITH ACUTE.
> > > > ```
> > > > And then I'm not actually sure why the non-Windows behavior claims that U+00DA is //not// alpha+upper.
> > > I think in the non-Windows case, the "C" locale might correspond to plain ASCII or something like that, where `\x00DA` isn't considered an alphabetical character.
> > Nit: for readability, I'd prefer linebreaks before each "If", and ideally no other linebreaks.
> > 
> > I'm still confused as to why the comment says "it depends" but the code on lines 114-115 is very clearly assuming that the character //is// U+00DA CAPITAL LETTER U WITH ACUTE. Is this because the comment is wrong? or because the test runner forces a specific codepage? or because we just luck into a specific codepage and this is super fragile? (Ditto throughout.)
> I think this (and a bunch of other similar tests for the "C" locale) might behave differently, if the systemwide default codepage was a more exotic one, yes. Hence, it strictly depends on what it is set to. I dno't think the tests have much say over what the "C" locale is.
> 
> Sure, I can unwrap the comment to make it more readable as you wrote originally.
As an alternative, we could also just ifdef out all tests for the non-ASCII chars in the "C" locales on Windows, as the interpretation strictly does depend on what the system's default codepage is.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120796/new/

https://reviews.llvm.org/D120796



More information about the libcxx-commits mailing list