[PATCH] D23981: Add StringRef::scan_between
Adrian McCarthy via llvm-commits
llvm-commits at lists.llvm.org
Mon Aug 29 15:05:34 PDT 2016
amccarth added inline comments.
Comment at: unittests/ADT/StringRefTest.cpp:382
@@ +381,3 @@
+ EXPECT_EQ("ScanBetween", Str.scan_between("::", "()"));
+ EXPECT_EQ("::ScanBetween()", Str.scan_between("StringRefTest", "ABCDE"));
+ EXPECT_EQ("", Str.scan_between("ABCDE", "::"));
> amccarth wrote:
> > zturner wrote:
> > > amccarth wrote:
> > > > The expectation here surprised me. Given that "ABCDE" doesn't exist in the string, I'd expect there to be no match (as with the next example). But there's an asymmetry here.
> > > >
> > > > I'm not necessarily opposed to this, but I think it's worth discussing since surprising behavior from an API leads to misuse of the API. At a minimum, the description of the method needs to be corrected to make this behavior explicit.
> > > >
> > > > Consider `Foo.scan_between("(", ")")`. Would you really want a non-empty result if `Foo` did not have a matched pair of parentheses?
> > > When I came up with these semantics, I decided to treat "I didn't find the string" not as an error, but as an "I made it to the end, keep going".
> > >
> > > This roughly matches the semantics of `split()`, where if if `A.split(B)` doesn't find `B`, it returns `(A, "")`.
> > `split` doesn't document what it does if you pass an empty string as a separator. I'd expect an empty string to match at the first position, which is what happens with `std::string::find`. If `scan_between` were to use the same semantics as `split`, I'd expect `A.scan_between("", "")` to return an empty string rather than `A`.
> The API should be able to handle the case where you say "I know the substring foo is in here, return me everything after foo". Likewise, it should be able to handle the case of "I know the substring foo is in here, return me everything before foo". With the current API, those would be expressed as `A.scan_between("foo", "")` and `A.scan_between("", "foo")` respectively. If `Right.empty()` is not treated as matching the end of the string, then there would be no way to implement the first case at all, which seems kind of fundamental for an api such as this.
> We could add an overload that takes a `size_t` for the second argument, allowing you to pass `npos`, but I'm not sure if anyone would ever use that api for any other purpose. Alternatively, we could add a single argument overload, but it might be confusing to have `A.scan_between("foo")` return something different from `A.scan_between("foo", "")`
>The API should be able to handle the case where you say "I know the substring foo is in here, return me everything after foo".
Isn't that what `split` is for? Why would someone choose `scan_between` for that?
I think this API behaves in a manner that's surprising enough that it will eventually confuse somebody into using it incorrectly. But that's only for corner cases, so it's probably not important enough. I'm willing to let it go if the behavior is more accurately document in the comments on the method.
More information about the llvm-commits