[PATCH] D40060: [clangd] Fuzzy match scorer

Sam McCall via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Nov 23 08:55:22 PST 2017


sammccall added a comment.

Thanks for the review, and sorry for the subtlety of the code and sparse comments.
It should be a little better now, please let me know which parts aren't clear enough.



================
Comment at: clangd/FuzzyMatch.cpp:69
+    : NPat(std::min<int>(MaxPat, Pattern.size())), NWord(0),
+      ScoreScale(0.5f / NPat) {
+  memcpy(Pat, Pattern.data(), NPat);
----------------
klimek wrote:
> Why .5?
The .5 and the 2 are the same thing. Extracted a constant with a comment.


================
Comment at: clangd/FuzzyMatch.cpp:92
+// Segmentation of words and patterns
+// We mark each character as the Head or Tail of a segment, or a Separator.
+// e.g. XMLHttpRequest_Async
----------------
klimek wrote:
> Do you mean "part of the Head or Tail"?
> Also, explain that these are the CharRoles. A reader reads this first, and will search for what CharRole means in the code later. CharRole is defined in a different file, without comments, so figuring out how that all relates is super hard :)
Rewrote this section and added more comments. CharRole is defined here now.

Each character in a segment that isn't the Head is the Tail. It's a bit of a stretch, but it's short and evocative and (now) explained with examples.


================
Comment at: clangd/FuzzyMatch.cpp:110
+// 4 packed CharTypes per entry, indexed by char.
+constexpr uint8_t CharTypes[] = {
+    0x00, 0x00, 0x00, 0x00, // Control characters
----------------
klimek wrote:
> Finding bugs in these will be hard :)
Ack :-(


================
Comment at: clangd/FuzzyMatch.cpp:120
+    0x55, 0x55, 0x55, 0x55, 0x55, 0x55, 0x55, 0x55, // Bytes over 127 -> Lower.
+    0x55, 0x55, 0x55, 0x55, 0x55, 0x55, 0x55, 0x55, // This includes UTF-8.
+    0x55, 0x55, 0x55, 0x55, 0x55, 0x55, 0x55, 0x55,
----------------
klimek wrote:
> Can you expand in the comment why this works for utf-8?
Done. It doesn't really "work", so much as we just give up...


================
Comment at: clangd/FuzzyMatch.cpp:212
+  ScoreT S = 0;
+  // Bonus: pattern is part of a word prefix.
+  if (P == W)
----------------
klimek wrote:
> Why does P == W imply that?
Every pattern character must match in order, so a match with P < W is impossible, and P == W means the match is perfect so far.
(Also explained in the comment)


================
Comment at: clangd/FuzzyMatch.cpp:215
+    ++S;
+  // Bonus: case matches, or an asserted word break matches an actual.
+  if (Pat[P] == Word[W] || (PatRole[P] == Head && WordRole[W] == Head))
----------------
klimek wrote:
> This is the first time the term "asserted word break" shows up, perhaps explain this when explaining the roles.
Rephrased the comment here to use the familiar terminology.


================
Comment at: clangd/FuzzyMatch.h:39
+
+private:
+  constexpr static int MaxPat = 63;
----------------
klimek wrote:
> I find most of the abbreviations here non-intuitive, and thus needing comments (Pat I get is for pattern :)
> N - what does it mean? Number of characters in the pattern? I'd use Length instead.
> LPat and LWord (no idea what L could stand for).
> 
Comments added throughout.

> N - what does it mean? Number of characters in the pattern? I'd use Length instead.

`WordLength` is too long for something so common I think, these each have >20 refs. Changed `NWord` -> `WordN` which I think reads better - `Word` and `WordN` form a nice pair.

(N rather than L for length because of confusion with Lower)

Changed `LWord` to `LowWord` etc.


================
Comment at: clangd/FuzzyMatch.h:50
+
+  int NPat, NWord;
+  char Pat[MaxPat];
----------------
klimek wrote:
> I'd use a StringRef instead, and call the storage *Storage or something.
What's the goal here?

I have a couple of objections to this:
 - if you actually use StringRef[] to access the data, now you've got a gratuitous indirection everywhere
 - For `LPat`/`LWord` too? Now we have *more* members than in the first place, and two ways to write each bounds check.

If the intent is to clean up the places where I construct `StringRef(Word, NWord)` explicitly, adding `StringRef word()` would certainly make sense.


https://reviews.llvm.org/D40060





More information about the cfe-commits mailing list