[llvm] [TableGen] Fix prefix detection with anchor (NFC) (PR #71379)
Nikita Popov via llvm-commits
llvm-commits at lists.llvm.org
Mon Nov 6 03:10:03 PST 2023
https://github.com/nikic created https://github.com/llvm/llvm-project/pull/71379
instregex uses an optimization, where the constant prefix of the regex is extracted to perform a binary search first. However, this optimization currently mainly fails to apply, because most instregex uses have an explicit ^ anchor, which gets counted as a meta char and disables the optimization.
Make sure the anchor is skipped when determining the prefix. Also fix an implementation bug this exposes, where the pick a too long prefix if the first meta character is a quantifier.
This cuts the time needed to generate files like X86GenInstrInfo.inc by half.
>From 64f5c675bb19eb8efa13e0e861aba03d0df7ebcd Mon Sep 17 00:00:00 2001
From: Nikita Popov <npopov at redhat.com>
Date: Mon, 6 Nov 2023 11:10:00 +0100
Subject: [PATCH] [TableGen] Fix prefix detection with anchor (NFC)
instregex uses an optimization, where the constant prefix of the
regex is extracted to perform a binary search first. However,
this optimization currently mainly fails to apply, because most
instregex uses have an explicit ^ anchor, which gets counted as
a meta char and disables the optimization.
Make sure the anchor is skipped when determining the prefix. Also
fix an implementation bug this exposes, where the pick a too long
prefix if the first meta character is a quantifier.
This cuts the time needed to generate files like X86GenInstrInfo.inc
by half.
---
llvm/utils/TableGen/CodeGenSchedule.cpp | 20 +++++++++++++++++++-
1 file changed, 19 insertions(+), 1 deletion(-)
diff --git a/llvm/utils/TableGen/CodeGenSchedule.cpp b/llvm/utils/TableGen/CodeGenSchedule.cpp
index c3c5e4f8eb2d8c3..54463da19821476 100644
--- a/llvm/utils/TableGen/CodeGenSchedule.cpp
+++ b/llvm/utils/TableGen/CodeGenSchedule.cpp
@@ -91,10 +91,25 @@ struct InstRegexOp : public SetTheory::Operator {
PrintFatalError(Loc, "instregex requires pattern string: " +
Expr->getAsString());
StringRef Original = SI->getValue();
+ // Drop an explicit ^ anchor to not interfere with prefix search.
+ bool HadAnchor = Original.consume_front("^");
// Extract a prefix that we can binary search on.
static const char RegexMetachars[] = "()^$|*+?.[]\\{}";
auto FirstMeta = Original.find_first_of(RegexMetachars);
+ if (FirstMeta != StringRef::npos && FirstMeta > 0) {
+ // If we have a regex like ABC* we can only use AB as the prefix, as
+ // the * acts on C.
+ switch (Original[FirstMeta]) {
+ case '+':
+ case '*':
+ case '?':
+ --FirstMeta;
+ break;
+ default:
+ break;
+ }
+ }
// Look for top-level | or ?. We cannot optimize them to binary search.
if (removeParens(Original).find_first_of("|?") != std::string::npos)
@@ -106,7 +121,10 @@ struct InstRegexOp : public SetTheory::Operator {
if (!PatStr.empty()) {
// For the rest use a python-style prefix match.
std::string pat = std::string(PatStr);
- if (pat[0] != '^') {
+ // Add ^ anchor. If we had one originally, don't need the group.
+ if (HadAnchor) {
+ pat.insert(0, "^");
+ } else {
pat.insert(0, "^(");
pat.insert(pat.end(), ')');
}
More information about the llvm-commits
mailing list