[PATCH] D40060: [clangd] Fuzzy match scorer

Fri Dec 1 09:08:13 PST 2017

sammccall marked an inline comment as done.
sammccall added a comment.

I'd broken the scoring scale with the last few tweaks:

- The harsh pattern-split penalty was driving too many decent matches to 0 score
- The case-insensitive change resulted in some perfect prefix matches not getting perfect scores

Added tweaks to address these. Match quality is now 0-3, with default being 1.
Happy to make followup changes, but this seems unlikely to be controversial :-)

================
Comment at: clangd/FuzzyMatch.h:53
+
+  int PatN, WordN;                               // Length of pattern and word.
+  char Pat[MaxPat], Word[MaxWord];               // Raw pattern and word data.
----------------
ilya-biryukov wrote:
> sammccall wrote:
> > ilya-biryukov wrote:
> > > Maybe we could split the data we store into two sections:
> > > 1. Pattern-specific data. Initialized on construction, never changed later.
> > > 2. Per-match data. Initialized per `match()` call.
> > > 
> > > Otherwise it is somewhat hard to certify whether everything is being initialized properly.
> > This hides the parallels between the Pattern and Word data, I'm not sure I like it better overall.
> > 
> > I've added a comment describing this split, reordered some variables, and renamed IsSubstring to WordContainsPattern, which I think clarifies this a bit. WDYT?
> I'd prefer grouping the fields by their lifetime in that case, because it makes certifying that everything was properly initialized easier. Which is especially a big deal when changing code to avoid silly initialization-related bugs.
> Grouping by meaning also makes lots of sense, of course, but logical relations are only hard to grasp when reading the code and don't usually cause subtle bugs when rewriting the code. And proper comments allow to reintroduce those logical parallels.
> 
> But that could be accounted to my personal preference, so feel free to leave the code as is. Just wanted to clarify my point a bit more.
Makes sense. I've split the fields as you  suggest, it also reads well.

https://reviews.llvm.org/D40060