[PATCH] D37958: [libc++] Correctly propagate user-defined lookup_classname().
Arthur O'Dwyer via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Mon Sep 18 11:23:04 PDT 2017
Quuxplusone added inline comments.
================
Comment at: libcxx/test/std/re/re.traits/lookup_classname_user_defined.pass.cpp:46
+ // matches all characters (they are classified as alnum)
+ std::wstring re1 = L"([[:alnum:]]+)";
+ std::regex_search(in, m, std::wregex(re1));
----------------
timshen wrote:
> Quuxplusone wrote:
> > Could you add a test here for
> >
> > std::wstring re3 = L"([[:ALNUM:]]+)";
> > std::regex_search(in, m, std::wregex(re3, std::regex_constants::icase));
> >
> > std::wstring re4 = L"(\\W+)";
> > std::regex_search(in, m, std::wregex(re4, std::regex_constants::icase));
> >
> > documenting the expected outputs? It's unclear to me from cppreference
> > http://en.cppreference.com/w/cpp/regex/regex_traits/lookup_classname
> > whether lookup_classname("W") is supposed to produce a result or not (but you seem to assume it does).
> >
> > My understanding is that the "icase" parameter to lookup_classname is talking about the icaseness of the regex matcher; classnames should always be matched with exact case, i.e. `[[:alnum:]]` is always a valid classname and `[[:ALNUM:]]` is always invalid, regardless of regex_constants::icase. But I'm not sure.
> [re.req] says that, for lookup_classname(), "The value returned shall be independent of the case of the characters in the sequence."
>
> I take it as regardless of lookup_classname()'s icase argument, [[:ALNUM:]] is always valid.
>
> There are existing tests that confirms it in std/re/re.traits/lookup_classname.pass.cpp. Search for "AlNum".
>
> I fixed my patch, since I was misunderstanding it as well (I thought icase is for the input char sequence). Now they are just forwarded into lookup_classname().
> [re.req] says that, for lookup_classname(), "The value returned shall be independent of the case of the characters in the sequence."
Huh. That seems like a defect in the Standard, since although lookup_classname() is parameterized on a locale, there is no standard way to get case-insensitive string comparison out of an arbitrary locale AFAIK. However, that's a problem for the implementer of lookup_classname(), not for you. Forwarding straight to lookup_classname() and letting it deal with questions of "case" sounds like the right approach to me.
If you *wanted* to go down this rabbit hole, a good test case would be `"[[:DIGIT:]]"` in Turkish locale (where lowercasing `"[[:DIGIT:]]"` produces `"[[:dıgıt:]]"` not `"[[:digit:]]"`).
(Note— The only place I find "case-insensitive" in N4659 is in the informative note on `time_get::get`: "It is unspecified by what means the function performs case-insensitive comparison or whether multi-character sequences are considered while doing so." —end note)
https://reviews.llvm.org/D37958
More information about the cfe-commits
mailing list