[PATCH] D59300: [clangd] Tune the fuzzy-matching algorithm

Ilya Biryukov via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Mar 15 04:13:04 PDT 2019


ilya-biryukov added inline comments.


================
Comment at: clang-tools-extra/clangd/FuzzyMatch.cpp:285
+  if (Pat[P] == Word[W] ||
+      (WordRole[W] == Head && (IsPatSingleCase || PatRole[P] == Head)))
     ++S;
----------------
ioeric wrote:
> ilya-biryukov wrote:
> > ioeric wrote:
> > > could you explain the intention of this change? Is it relevant to other changes in the patch? The original looks reasonable to me. 
> > The motivation is keeping a bonus for matching the head of a segment even in case of single-case patterns that lack segmentation signals.
> > Previously, `p` from `up` would not get this bonus when matching `unique_[p]tr` even though intuitively it should as we do want to encourage those matches.
> so, iiuc, when `IsPatSingleCase` is true, it means that any character in the pattern can potentially be a head? could you add comment for this?
> 
> We also had a stricter condition for `Pat[P] == Word[W]`, but now `ut` would get a bonus for `[u]nique_p[t]r`. Is this intended?
> so, iiuc, when IsPatSingleCase is true, it means that any character in the pattern can potentially be a head? could you add comment for this?
Exactly, we don't have any segmentation signals in a pattern and we don't want small drops in scores on each match a segment head, e.g. there's no reason matching `c` should have smaller bonus than `r` when matching `My[StrC]at` with `strc`.
Added a comment.

> We also had a stricter condition for Pat[P] == Word[W], but now ut would get a bonus for [u]nique_p[t]r. Is this intended?
This one is tricky and I'm less sure of it. I think the important bit is to give this bonus to `t` in case of `[u]nique_[pt]r`.
Otherwise we're making a match of the non-head character worse for no reason. I think the real problem in the previous implementation was its preference of prefix matches (the `P == W` condition), which gave an unfair advantage to `[up]per_bound` over `[u]nique_[p]tr`.

Matches in the middle of a segment are harshly penalized in the next line, so this small bonus does not seem to matter in practice for the case you mentioned. Maybe there's a better way to do this.


================
Comment at: clang-tools-extra/clangd/FuzzyMatch.cpp:273
   if (WordRole[W] == Head) // Skipping a segment.
-    S += 1;
-  if (Last == Match) // Non-consecutive match.
-    S += 2;          // We'd rather skip a segment than split our match.
-  return S;
+    return 1; // We want to keep it lower than the bonus for a consecutive
+              // match.
----------------
ioeric wrote:
> nit: move this comment into into own line now that it's taking two?
Shrinked it to fit into a single line.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59300/new/

https://reviews.llvm.org/D59300





More information about the cfe-commits mailing list