[llvm] 0a0e06f - [TableGen] Fix prefix detection with anchor (NFC) (#71379)
via llvm-commits
llvm-commits at lists.llvm.org
Mon Nov 13 06:47:19 PST 2023
Author: Nikita Popov
Date: 2023-11-13T15:47:15+01:00
New Revision: 0a0e06f29145213e90de88ca39f7b505ce092a4a
URL: https://github.com/llvm/llvm-project/commit/0a0e06f29145213e90de88ca39f7b505ce092a4a
DIFF: https://github.com/llvm/llvm-project/commit/0a0e06f29145213e90de88ca39f7b505ce092a4a.diff
LOG: [TableGen] Fix prefix detection with anchor (NFC) (#71379)
instregex uses an optimization, where the constant prefix of the regex
is extracted to perform a binary search first. However, this
optimization currently mainly fails to apply, because most instregex
uses have an explicit ^ anchor, which gets counted as a meta char and
disables the optimization.
Make sure the anchor is skipped when determining the prefix. Also fix an
implementation bug this exposes, where the pick a too long prefix if the
first meta character is a quantifier.
This cuts the time needed to generate files like X86GenInstrInfo.inc by
half.
Added:
Modified:
llvm/utils/TableGen/CodeGenSchedule.cpp
Removed:
################################################################################
diff --git a/llvm/utils/TableGen/CodeGenSchedule.cpp b/llvm/utils/TableGen/CodeGenSchedule.cpp
index c3c5e4f8eb2d8c3..54463da19821476 100644
--- a/llvm/utils/TableGen/CodeGenSchedule.cpp
+++ b/llvm/utils/TableGen/CodeGenSchedule.cpp
@@ -91,10 +91,25 @@ struct InstRegexOp : public SetTheory::Operator {
PrintFatalError(Loc, "instregex requires pattern string: " +
Expr->getAsString());
StringRef Original = SI->getValue();
+ // Drop an explicit ^ anchor to not interfere with prefix search.
+ bool HadAnchor = Original.consume_front("^");
// Extract a prefix that we can binary search on.
static const char RegexMetachars[] = "()^$|*+?.[]\\{}";
auto FirstMeta = Original.find_first_of(RegexMetachars);
+ if (FirstMeta != StringRef::npos && FirstMeta > 0) {
+ // If we have a regex like ABC* we can only use AB as the prefix, as
+ // the * acts on C.
+ switch (Original[FirstMeta]) {
+ case '+':
+ case '*':
+ case '?':
+ --FirstMeta;
+ break;
+ default:
+ break;
+ }
+ }
// Look for top-level | or ?. We cannot optimize them to binary search.
if (removeParens(Original).find_first_of("|?") != std::string::npos)
@@ -106,7 +121,10 @@ struct InstRegexOp : public SetTheory::Operator {
if (!PatStr.empty()) {
// For the rest use a python-style prefix match.
std::string pat = std::string(PatStr);
- if (pat[0] != '^') {
+ // Add ^ anchor. If we had one originally, don't need the group.
+ if (HadAnchor) {
+ pat.insert(0, "^");
+ } else {
pat.insert(0, "^(");
pat.insert(pat.end(), ')');
}
More information about the llvm-commits
mailing list