[llvm] 05c495d - [SpecialCaseList] Filtering Globs with matching prefix and suffix (#164543)
via llvm-commits
llvm-commits at lists.llvm.org
Sat Oct 25 09:48:00 PDT 2025
Author: Vitaly Buka
Date: 2025-10-25T09:47:56-07:00
New Revision: 05c495de132f7609537686f60f312059ea70b4a6
URL: https://github.com/llvm/llvm-project/commit/05c495de132f7609537686f60f312059ea70b4a6
DIFF: https://github.com/llvm/llvm-project/commit/05c495de132f7609537686f60f312059ea70b4a6.diff
LOG: [SpecialCaseList] Filtering Globs with matching prefix and suffix (#164543)
This commit enhances the `SpecialCaseList::GlobMatcher` to filter globs
more efficiently by considering both prefixes and suffixes.
Previously, the `GlobMatcher` used a `RadixTree` to store globs based
on their prefixes. This allowed for quick lookup of potential matches
by matching the query string's prefix against the stored prefixes.
However, for globs with common prefixes but different suffixes,
unnecessary glob matching attempts could still occur.
This change introduces a nested `RadixTree` structure:
`PrefixSuffixToGlob: RadixTree<Prefix, RadixTree<Suffix, Globs>>`.
Now, when a query string is matched, it first finds matching prefixes,
and then within those prefix matches, it further filters by matching
the reversed suffix of the query string against the reversed suffixes
of the globs. This significantly reduces the number of `Glob::match`
calls, especially for large special case lists with many globs sharing
common prefixes but differing in their suffixes.
According to SpecialCaseListBM:
Lookup benchmarks (significant improvements):
```
OVERALL_GEOMEAN -0.5815
```
Lookup `*suffix` and `prefix*suffix` like benchmarks (huge
improvements):
```
OVERALL_GEOMEAN -0.9316
```
https://gist.github.com/vitalybuka/e586751902760ced6beefcdf0d7b26fd
Added:
Modified:
llvm/include/llvm/Support/SpecialCaseList.h
llvm/lib/Support/SpecialCaseList.cpp
Removed:
################################################################################
diff --git a/llvm/include/llvm/Support/SpecialCaseList.h b/llvm/include/llvm/Support/SpecialCaseList.h
index a235975b152c3..860f73c798e41 100644
--- a/llvm/include/llvm/Support/SpecialCaseList.h
+++ b/llvm/include/llvm/Support/SpecialCaseList.h
@@ -167,8 +167,9 @@ class SpecialCaseList {
std::vector<GlobMatcher::Glob> Globs;
RadixTree<iterator_range<StringRef::const_iterator>,
- SmallVector<const GlobMatcher::Glob *, 1>>
- PrefixToGlob;
+ RadixTree<iterator_range<StringRef::const_reverse_iterator>,
+ SmallVector<const GlobMatcher::Glob *, 1>>>
+ PrefixSuffixToGlob;
};
/// Represents a set of patterns and their line numbers
diff --git a/llvm/lib/Support/SpecialCaseList.cpp b/llvm/lib/Support/SpecialCaseList.cpp
index c27f627446203..3a9718569a06f 100644
--- a/llvm/lib/Support/SpecialCaseList.cpp
+++ b/llvm/lib/Support/SpecialCaseList.cpp
@@ -92,8 +92,10 @@ void SpecialCaseList::GlobMatcher::preprocess(bool BySize) {
for (const auto &G : reverse(Globs)) {
StringRef Prefix = G.Pattern.prefix();
+ StringRef Suffix = G.Pattern.suffix();
- auto &V = PrefixToGlob.emplace(Prefix).first->second;
+ auto &SToGlob = PrefixSuffixToGlob.emplace(Prefix).first->second;
+ auto &V = SToGlob.emplace(reverse(Suffix)).first->second;
V.emplace_back(&G);
}
}
@@ -101,16 +103,18 @@ void SpecialCaseList::GlobMatcher::preprocess(bool BySize) {
void SpecialCaseList::GlobMatcher::match(
StringRef Query,
llvm::function_ref<void(StringRef Rule, unsigned LineNo)> Cb) const {
- if (!PrefixToGlob.empty()) {
- for (const auto &[_, V] : PrefixToGlob.find_prefixes(Query)) {
- for (const auto *G : V) {
- if (G->Pattern.match(Query)) {
- Cb(G->Name, G->LineNo);
- // As soon as we find a match in the vector, we can break for this
- // vector, since the globs are already sorted by priority within the
- // prefix group. However, we continue searching other prefix groups in
- // the map, as they may contain a better match overall.
- break;
+ if (!PrefixSuffixToGlob.empty()) {
+ for (const auto &[_, SToGlob] : PrefixSuffixToGlob.find_prefixes(Query)) {
+ for (const auto &[_, V] : SToGlob.find_prefixes(reverse(Query))) {
+ for (const auto *G : V) {
+ if (G->Pattern.match(Query)) {
+ Cb(G->Name, G->LineNo);
+ // As soon as we find a match in the vector, we can break for this
+ // vector, since the globs are already sorted by priority within the
+ // prefix group. However, we continue searching other prefix groups
+ // in the map, as they may contain a better match overall.
+ break;
+ }
}
}
}
More information about the llvm-commits
mailing list