[llvm] bb5e26d - [Support] Fix alternation support in backreferences (PR60073)

Nikita Popov via llvm-commits llvm-commits at lists.llvm.org
Tue Jan 17 00:58:18 PST 2023


Author: Nikita Popov
Date: 2023-01-17T09:58:10+01:00
New Revision: bb5e26dad9512e5d60a0462edf0d07044d21e22e

URL: https://github.com/llvm/llvm-project/commit/bb5e26dad9512e5d60a0462edf0d07044d21e22e
DIFF: https://github.com/llvm/llvm-project/commit/bb5e26dad9512e5d60a0462edf0d07044d21e22e.diff

LOG: [Support] Fix alternation support in backreferences (PR60073)

backref() always performs a full match on the remaining string,
and as such also needs to be matched against the whole remaining
strip. For alternations, the match was performed against just the
sub-strip for one alternative, which would of course fail to match
the whole string.

This can be done by skipping the part of the strip between OOR1
and O_CH, so that only the first alternative in the strip is
matched, and the remaining ones are skipped. Indeed, the necessary
OOR1 skipping code was already implemented in the easy-path of
backref(), so this is clearly how it was supposed to work.

However, there were two bugs: First, under this scheme we should
be passing the stop point of the original strip, not just the
alternative sub-strip. Second, while skipping for OOR1 was
implemented, handling for O_CH was missing. This would occur when
the last alternative matches, as O_CH is preceded by an implicit
OOR1 only.

Fixes https://github.com/llvm/llvm-project/issues/60073.

Added: 
    

Modified: 
    llvm/lib/Support/regengine.inc
    llvm/unittests/Support/RegexTest.cpp

Removed: 
    


################################################################################
diff  --git a/llvm/lib/Support/regengine.inc b/llvm/lib/Support/regengine.inc
index 3b7014a3d3fb9..b32392a861200 100644
--- a/llvm/lib/Support/regengine.inc
+++ b/llvm/lib/Support/regengine.inc
@@ -590,6 +590,7 @@ backref(struct match *m, const char *start, const char *stop, sopno startst,
 				return(NULL);
 			break;
 		case O_QUEST:
+		case O_CH:
 			break;
 		case OOR1:	/* matches null but needs to skip */
 			ss++;
@@ -662,7 +663,7 @@ backref(struct match *m, const char *start, const char *stop, sopno startst,
 		esub = ss + OPND(s) - 1;
 		assert(OP(m->g->strip[esub]) == OOR1);
 		for (;;) {	/* find first matching branch */
-			dp = backref(m, sp, stop, ssub, esub, lev, rec);
+			dp = backref(m, sp, stop, ssub, stopst, lev, rec);
 			if (dp != NULL)
 				return(dp);
 			/* that one missed, try next one */

diff  --git a/llvm/unittests/Support/RegexTest.cpp b/llvm/unittests/Support/RegexTest.cpp
index eb8160e466665..78f37cdbd1ef8 100644
--- a/llvm/unittests/Support/RegexTest.cpp
+++ b/llvm/unittests/Support/RegexTest.cpp
@@ -80,6 +80,24 @@ TEST_F(RegexTest, Backreferences) {
   EXPECT_EQ("z", Matches[2].str());
   EXPECT_FALSE(r3.match("a6zb6y"));
   EXPECT_FALSE(r3.match("a6zb7z"));
+
+  Regex r4("(abc|xyz|uvw)_\\1");
+  EXPECT_TRUE(r4.match("abc_abc", &Matches));
+  EXPECT_EQ(2u, Matches.size());
+  EXPECT_FALSE(r4.match("abc_ab", &Matches));
+  EXPECT_FALSE(r4.match("abc_xyz", &Matches));
+
+  Regex r5("(xyz|abc|uvw)_\\1");
+  EXPECT_TRUE(r5.match("abc_abc", &Matches));
+  EXPECT_EQ(2u, Matches.size());
+  EXPECT_FALSE(r5.match("abc_ab", &Matches));
+  EXPECT_FALSE(r5.match("abc_xyz", &Matches));
+
+  Regex r6("(xyz|uvw|abc)_\\1");
+  EXPECT_TRUE(r6.match("abc_abc", &Matches));
+  EXPECT_EQ(2u, Matches.size());
+  EXPECT_FALSE(r6.match("abc_ab", &Matches));
+  EXPECT_FALSE(r6.match("abc_xyz", &Matches));
 }
 
 TEST_F(RegexTest, Substitution) {


        


More information about the llvm-commits mailing list