[PATCH] D37958: [libc++] Correctly propagate user-defined lookup_classname().

Arthur O'Dwyer via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Sep 18 11:23:04 PDT 2017


Quuxplusone added inline comments.


================
Comment at: libcxx/test/std/re/re.traits/lookup_classname_user_defined.pass.cpp:46
+    // matches all characters (they are classified as alnum)
+    std::wstring re1 = L"([[:alnum:]]+)";
+    std::regex_search(in, m, std::wregex(re1));
----------------
timshen wrote:
> Quuxplusone wrote:
> > Could you add a test here for
> > 
> >     std::wstring re3 = L"([[:ALNUM:]]+)";
> >     std::regex_search(in, m, std::wregex(re3, std::regex_constants::icase));
> > 
> >     std::wstring re4 = L"(\\W+)";
> >     std::regex_search(in, m, std::wregex(re4, std::regex_constants::icase));
> > 
> > documenting the expected outputs?  It's unclear to me from cppreference
> > http://en.cppreference.com/w/cpp/regex/regex_traits/lookup_classname
> > whether lookup_classname("W") is supposed to produce a result or not (but you seem to assume it does).
> > 
> > My understanding is that the "icase" parameter to lookup_classname is talking about the icaseness of the regex matcher; classnames should always be matched with exact case, i.e. `[[:alnum:]]` is always a valid classname and `[[:ALNUM:]]` is always invalid, regardless of regex_constants::icase. But I'm not sure.
> [re.req] says that, for lookup_classname(), "The value returned shall be independent of the case of the characters in the sequence."
> 
> I take it as regardless of lookup_classname()'s icase argument, [[:ALNUM:]] is always valid.
> 
> There are existing tests that confirms it in std/re/re.traits/lookup_classname.pass.cpp. Search for "AlNum".
> 
> I fixed my patch, since I was misunderstanding it as well (I thought icase is for the input char sequence). Now they are just forwarded into lookup_classname().
> [re.req] says that, for lookup_classname(), "The value returned shall be independent of the case of the characters in the sequence."

Huh. That seems like a defect in the Standard, since although lookup_classname() is parameterized on a locale, there is no standard way to get case-insensitive string comparison out of an arbitrary locale AFAIK. However, that's a problem for the implementer of lookup_classname(), not for you. Forwarding straight to lookup_classname() and letting it deal with questions of "case" sounds like the right approach to me.

If you *wanted* to go down this rabbit hole, a good test case would be `"[[:DIGIT:]]"` in Turkish locale (where lowercasing `"[[:DIGIT:]]"` produces `"[[:dıgıt:]]"` not `"[[:digit:]]"`).

(Note— The only place I find "case-insensitive" in N4659 is in the informative note on `time_get::get`: "It is unspecified by what means the function performs case-insensitive comparison or whether multi-character sequences are considered while doing so." —end note)


https://reviews.llvm.org/D37958





More information about the cfe-commits mailing list