[llvm] 883c117 - [Regex] Avoid NFA machinery for fixed prefix chars (NFC)

Nikita Popov via llvm-commits llvm-commits at lists.llvm.org
Thu Jan 19 02:13:02 PST 2023


Author: Nikita Popov
Date: 2023-01-19T11:12:53+01:00
New Revision: 883c117d1a4cce3c19aa521fccaf8f938269fc57

URL: https://github.com/llvm/llvm-project/commit/883c117d1a4cce3c19aa521fccaf8f938269fc57
DIFF: https://github.com/llvm/llvm-project/commit/883c117d1a4cce3c19aa521fccaf8f938269fc57.diff

LOG: [Regex] Avoid NFA machinery for fixed prefix chars (NFC)

Similarly to what backref() does, add an "easy path" to slow()
that can handle some non-branching cases, in particular simple
character matches.

This has the dual effect of reducing the number of characters we
need to match, and the number of states in the NFA.

This reduces FileCheck runtime on vloxseg.c from 17s to 12s on
my machine.

Added: 
    

Modified: 
    llvm/lib/Support/regengine.inc

Removed: 
    


################################################################################
diff  --git a/llvm/lib/Support/regengine.inc b/llvm/lib/Support/regengine.inc
index 681b7145e4428..f23993abc6e7e 100644
--- a/llvm/lib/Support/regengine.inc
+++ b/llvm/lib/Support/regengine.inc
@@ -810,11 +810,33 @@ static const char *			/* where it ended */
 slow(struct match *m, const char *start, const char *stop, sopno startst,
      sopno stopst)
 {
+	/* Quickly skip over fixed character matches at the start. */
+	const char *p = start;
+	for (; startst < stopst; ++startst) {
+		int hard = 0;
+		sop s = m->g->strip[startst];
+		switch (OP(s)) {
+		case OLPAREN:
+		case ORPAREN:
+			/* Not relevant here. */
+			break;
+		case OCHAR:
+			if (p == stop || *p != (char)OPND(s))
+				return NULL;
+			++p;
+			break;
+		default:
+			hard = 1;
+			break;
+		}
+		if (hard)
+			break;
+	}
+
 	states st = m->st;
 	states empty = m->empty;
 	states tmp = m->tmp;
-	const char *p = start;
-	int c = (start == m->beginp) ? OUT : *(start-1);
+	int c = (p == m->beginp) ? OUT : *(p-1);
 	int lastc;	/* previous c */
 	int flagch;
 	int i;


        


More information about the llvm-commits mailing list