[PATCH] D59300: [clangd] Tune the fuzzy-matching algorithm

Ilya Biryukov via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Mar 15 02:44:35 PDT 2019

ilya-biryukov marked 5 inline comments as done.
ilya-biryukov added inline comments.

Comment at: clang-tools-extra/clangd/FuzzyMatch.cpp:74
 static bool isAwful(int S) { return S < AwfulScore / 2; }
-static constexpr int PerfectBonus = 3; // Perfect per-pattern-char score.
+static constexpr int PerfectBonus = 4; // Perfect per-pattern-char score.
ioeric wrote:
> could you explain this change?
We give an extra bonus point for a consecutive match now, so the maximum bonus is now higher.
This constant is used for score normalization to bring the final score into `[0,1]` range (IIUC, there's a special case when the score can be `2.0`, in case of a substring match, but the normalization is still applied to per-char scores)

Comment at: clang-tools-extra/clangd/FuzzyMatch.cpp:270
 int FuzzyMatcher::skipPenalty(int W, Action Last) const {
-  int S = 0;
+  if (W == 0) // Skipping the first character.
+    return 3;
ioeric wrote:
> If the word is "_something", do we still want to penalize skipping first char?
Yes and it should actually play out nicely in practice. Consider two possibilities:
1. the pattern starts with `_`. In that case we want the items starting with `_` to be ranked higher, penalizing other items achieves this.
2. the pattern does not start with `_`. In that case all the items, starting with `_` will get a penalty and we will prefer items starting with the first character of the pattern. Also looks desirable.

This penalty in some sense replaces the `prefix match` boost we had before.

Comment at: clang-tools-extra/clangd/FuzzyMatch.cpp:275
+              // match.
+  return 0;
ioeric wrote:
> iiuc, there is no penalty by default because consecutive matches will gain bonus. This is not trivial here. Maybe add a comment? Or consider apply penalty for non-consecutive matches and no bonus for consecutive ones.
Added a comment. Adding a bonus instead of a penalty seems to produce smaller discrepancies between match and no-match cases for small patterns.

I.e. the non-consecutive match of a next segment gets a smaller hit in the final score.
I.e.  a match of `p` in `[u]nique_[p]tr` gets a score of `2 out of 4` and no penalties associated with it now, previously it would get a score of `2 out of 3` but a penalty of `2` for a non-consecutive match, which just turned out to be too much.

Comment at: clang-tools-extra/clangd/FuzzyMatch.cpp:285
+  if (Pat[P] == Word[W] ||
+      (WordRole[W] == Head && (IsPatSingleCase || PatRole[P] == Head)))
ioeric wrote:
> could you explain the intention of this change? Is it relevant to other changes in the patch? The original looks reasonable to me. 
The motivation is keeping a bonus for matching the head of a segment even in case of single-case patterns that lack segmentation signals.
Previously, `p` from `up` would not get this bonus when matching `unique_[p]tr` even though intuitively it should as we do want to encourage those matches.

  rG LLVM Github Monorepo



More information about the cfe-commits mailing list