libc++ patch for LWG Issue 2232 - dangling reference for regex_iterators

Marshall Clow mclow.lists at gmail.com
Wed Feb 19 12:44:25 PST 2014


On Feb 19, 2014, at 11:21 AM, David Blaikie <dblaikie at gmail.com> wrote:

> 
> 
> 
> On Wed, Feb 19, 2014 at 9:17 AM, Marshall Clow <mclow.lists at gmail.com> wrote:
> On Feb 14, 2014, at 4:31 PM, David Blaikie <dblaikie at gmail.com> wrote:
>> On Fri, Feb 14, 2014 at 4:21 PM, Marshall Clow <mclow.lists at gmail.com> wrote:
>> Users can write
>>         for(sregex_iterator i(s.begin(), s.end(), regex("meow")), end; i != end; ++i)
>> 
>> binding a temporary regex to const regex& and storing a pointer to it.
>> This will compile silently, triggering undefined behavior at runtime.
>> 
>> Fixing this involves defining constructors for the various regex iterator types from rvalue regexes, and then marking them as “deleted”.
>> And tests.
>> 
>> Would this break (what I assume is) correct code doing things like:
>> 
>> std::find(sregex_iterator(s.begin(), s.end(), regex("meow")), end, "foo”);  ?
> 
> (Recapping a discussion on IRC)
> 
> Yes, it would.
> And though that code is "technically correct” it (a) is dangerous, and (b) doesn’t do anything useful.
> 
> The return result from std::find is an interator into a destroyed object, so you can’t really do anything with that.
> 
> The “fix” for this code is simple:
> 
> 	regex re{“meow”}
> 	std::find(sregex_iterator(s.begin(), s.end(), re), end, "foo”); 
> 
> and now the return value of std::find is valid (until the end of the lifetime of “re”).
> 
> Fair point about find, though it's not too hard to find examples that take iterators and don't return temporaries from them.
> 
> There are lots of APIs that take references to things and return references with the same lifetime as the parameter and would break in similar situations, but perhaps these particular APIs are substantially more prone to this kind of problem and worthy of a specific safety guard.

If you’ve got a list, I’ll take a look at them.
Maybe the committee will want to “protect” them too.

>  
> See 
> 	http://cplusplus.github.io/LWG/lwg-active.html#2329
> and	http://cplusplus.github.io/LWG/lwg-active.html#2332
> 
> for the changes to the (C++14) standard.
> 
> In any case, this was entirely a misunderstanding on my part - not realizing (though you'd mentioned in the subject) that this was a new, deliberately non-strictly-backwards-compatible change mandated by the next standard. I'd mistakenly thought this was an attempt to add some standard-allowable-but-not-mandated safety so I was confused why it would break existing correct callers. I'm still a little surprised at this being worth a non-backward-compatible change in the standard, but that's up to you and the rest of the committee and I didn't mean to get in the way of whatever decision came out of that.
> 
> A few minor, optional, comments on the patch itself:
> 
> * An extra explicit scope in each of the test cases seems strange, but perhaps there's a reason for it given how consistently you've done that.

That makes it easy to copy/paste blocks of code, w/o worrying about variable name conflicts.


> * The "= delete" ctors you've added aren't preprocessor protected (just "// C++14") - why's that? Won't this break C++11? Or was this change accepted as a DR to ’11?

They aren’t?  Checks…

In the synopsis (the big comment at the front of the file), this is true. 

    regex_token_iterator(BidirectionalIterator a, BidirectionalIterator b,
                         const regex_type&& re, int submatch = 0,
                         regex_constants::match_flag_type m = regex_constants::match_default) = delete; // C++14

But that’s just a comment.
Down below, in the code, we have:

#if _LIBCPP_STD_VER > 11
    regex_token_iterator(_BidirectionalIterator __a, _BidirectionalIterator __b,
                         const regex_type&& __re, int __submatch = 0,
                         regex_constants::match_flag_type __m =
                                       regex_constants::match_default) = delete;
#endif


> * The version tests of "> 11" and "<= 11" seem a bit harder to read (to me) than ">= 14" and "< 14" which speaks to the actual version the feature was introduced in rather than having to mentally compute that from the prior version.

Sure. But we don’t have a C++14 standard yet. It’s just “the one after 11”. Right now, _LIBCPP_STD_VER is set to “some value > 11” if you compile with "-std=c++1y” (actually 13). When the standard is approved (probably in July), then I’ll change _LIBCPP_STD_VER to 14 when compiling with “-std=c++14”. (and make a new value for “the standard after 14”).

This is keyed off the value of __cplusplus, which is defined in the standard.  But until we have a C++14 standard, we won’t have a value for __cplusplus (though I would guess it will be 201407L).

— Marshall






-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20140219/bea4706f/attachment.html>


More information about the cfe-commits mailing list