[PATCH] D23981: Add StringRef::scan_between

Zachary Turner via llvm-commits llvm-commits at lists.llvm.org
Mon Aug 29 14:53:18 PDT 2016

zturner added inline comments.

Comment at: unittests/ADT/StringRefTest.cpp:382
@@ +381,3 @@
+  EXPECT_EQ("ScanBetween", Str.scan_between("::", "()"));
+  EXPECT_EQ("::ScanBetween()", Str.scan_between("StringRefTest", "ABCDE"));
+  EXPECT_EQ("", Str.scan_between("ABCDE", "::"));
amccarth wrote:
> zturner wrote:
> > amccarth wrote:
> > > The expectation here surprised me.  Given that "ABCDE" doesn't exist in the string, I'd expect there to be no match (as with the next example).  But there's an asymmetry here.
> > > 
> > > I'm not necessarily opposed to this, but I think it's worth discussing since surprising behavior from an API leads to misuse of the API.  At a minimum, the description of the method needs to be corrected to make this behavior explicit.
> > > 
> > > Consider `Foo.scan_between("(", ")")`.  Would you really want a non-empty result if `Foo` did not have a matched pair of parentheses?
> > When I came up with these semantics, I decided to treat "I didn't find the string" not as an error, but as an "I made it to the end, keep going".
> > 
> > This roughly matches the semantics of `split()`, where if if `A.split(B)` doesn't find `B`, it returns `(A, "")`.
> `split` doesn't document what it does if you pass an empty string as a separator.  I'd expect an empty string to match at the first position, which is what happens with `std::string::find`.  If `scan_between` were to use the same semantics as `split`, I'd expect `A.scan_between("", "")` to return an empty string rather than `A`.
The API should be able to handle the case where you say "I know the substring foo is in here, return me everything after foo".  Likewise, it should be able to handle the case of "I know the substring foo is in here, return me everything before foo".  With the current API, those would be expressed as `A.scan_between("foo", "")` and `A.scan_between("", "foo")` respectively.  If `Right.empty()` is not treated as matching the end of the string, then there would be no way to implement the first case at all, which seems kind of fundamental for an api such as this.

We could add an overload that takes a `size_t` for the second argument, allowing you to pass `npos`, but I'm not sure if anyone would ever use that api for any other purpose.  Alternatively, we could add a single argument overload, but it might be confusing to have `A.scan_between("foo")` return something different from `A.scan_between("foo", "")`


More information about the llvm-commits mailing list