[PATCH] D27188: Use trigrams to speed up SpecialCaseList.
Peter Collingbourne via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Nov 29 22:54:03 PST 2016
pcc added a comment.
Nice!
Do you have before/after performance numbers for a large Chromium TU? Maybe also measure without the blacklist for a baseline.
================
Comment at: lib/Support/Regex.cpp:199
+ // Check for regex metacharacters other than '*' and '.'.
+ return Str.find_first_of(RegexAdvancedMetachars) == StringRef::npos;
+}
----------------
Does this belong in `Regex`? It feels more like an implementation detail of `TrigramIndex`.
================
Comment at: lib/Support/SpecialCaseList.cpp:34
+ std::vector<int> Counts;
+ std::unordered_map<unsigned, std::vector<size_t>> Index;
+
----------------
Maybe use a SmallVector instead of std::vector since the size is bounded?
This code also needs more comments. Please at least leave comments for TrigramIndex and its fields.
================
Comment at: lib/Support/SpecialCaseList.cpp:47
+ std::set<unsigned> Was;
+ size_t I = Counts.size();
+ unsigned Tri = 0;
----------------
Maybe fold this into the use? I found it a little confusing to use a variable named "I" for something other than a loop counter.
================
Comment at: lib/Support/SpecialCaseList.cpp:50
+ int Len = 0;
+ for (size_t J = 0; J < Regex.size(); J++) {
+ if (Regex[J] == '.' || Regex[J] == '*') {
----------------
Maybe use a range for loop here?
================
Comment at: lib/Support/SpecialCaseList.cpp:67
+ Cnt++;
+ if (!Was.count(Tri)) {
+ Index[Tri].push_back(I);
----------------
I wonder how important it is to test `Was` here. It looks like things would still work if you allow trigrams to have multiple identical indices, and as you mention in the comment above it isn't worth worrying about popular trigrams.
================
Comment at: lib/Support/SpecialCaseList.cpp:82
+ bool isDefinitelyOut(StringRef Query) const {
+ if (Defeated) {
+ return false;
----------------
Remove braces
https://reviews.llvm.org/D27188
More information about the llvm-commits
mailing list