[PATCH] D40060: [clangd] Fuzzy match scorer
Sam McCall via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Thu Nov 23 08:55:22 PST 2017
sammccall added a comment.
Thanks for the review, and sorry for the subtlety of the code and sparse comments.
It should be a little better now, please let me know which parts aren't clear enough.
================
Comment at: clangd/FuzzyMatch.cpp:69
+ : NPat(std::min<int>(MaxPat, Pattern.size())), NWord(0),
+ ScoreScale(0.5f / NPat) {
+ memcpy(Pat, Pattern.data(), NPat);
----------------
klimek wrote:
> Why .5?
The .5 and the 2 are the same thing. Extracted a constant with a comment.
================
Comment at: clangd/FuzzyMatch.cpp:92
+// Segmentation of words and patterns
+// We mark each character as the Head or Tail of a segment, or a Separator.
+// e.g. XMLHttpRequest_Async
----------------
klimek wrote:
> Do you mean "part of the Head or Tail"?
> Also, explain that these are the CharRoles. A reader reads this first, and will search for what CharRole means in the code later. CharRole is defined in a different file, without comments, so figuring out how that all relates is super hard :)
Rewrote this section and added more comments. CharRole is defined here now.
Each character in a segment that isn't the Head is the Tail. It's a bit of a stretch, but it's short and evocative and (now) explained with examples.
================
Comment at: clangd/FuzzyMatch.cpp:110
+// 4 packed CharTypes per entry, indexed by char.
+constexpr uint8_t CharTypes[] = {
+ 0x00, 0x00, 0x00, 0x00, // Control characters
----------------
klimek wrote:
> Finding bugs in these will be hard :)
Ack :-(
================
Comment at: clangd/FuzzyMatch.cpp:120
+ 0x55, 0x55, 0x55, 0x55, 0x55, 0x55, 0x55, 0x55, // Bytes over 127 -> Lower.
+ 0x55, 0x55, 0x55, 0x55, 0x55, 0x55, 0x55, 0x55, // This includes UTF-8.
+ 0x55, 0x55, 0x55, 0x55, 0x55, 0x55, 0x55, 0x55,
----------------
klimek wrote:
> Can you expand in the comment why this works for utf-8?
Done. It doesn't really "work", so much as we just give up...
================
Comment at: clangd/FuzzyMatch.cpp:212
+ ScoreT S = 0;
+ // Bonus: pattern is part of a word prefix.
+ if (P == W)
----------------
klimek wrote:
> Why does P == W imply that?
Every pattern character must match in order, so a match with P < W is impossible, and P == W means the match is perfect so far.
(Also explained in the comment)
================
Comment at: clangd/FuzzyMatch.cpp:215
+ ++S;
+ // Bonus: case matches, or an asserted word break matches an actual.
+ if (Pat[P] == Word[W] || (PatRole[P] == Head && WordRole[W] == Head))
----------------
klimek wrote:
> This is the first time the term "asserted word break" shows up, perhaps explain this when explaining the roles.
Rephrased the comment here to use the familiar terminology.
================
Comment at: clangd/FuzzyMatch.h:39
+
+private:
+ constexpr static int MaxPat = 63;
----------------
klimek wrote:
> I find most of the abbreviations here non-intuitive, and thus needing comments (Pat I get is for pattern :)
> N - what does it mean? Number of characters in the pattern? I'd use Length instead.
> LPat and LWord (no idea what L could stand for).
>
Comments added throughout.
> N - what does it mean? Number of characters in the pattern? I'd use Length instead.
`WordLength` is too long for something so common I think, these each have >20 refs. Changed `NWord` -> `WordN` which I think reads better - `Word` and `WordN` form a nice pair.
(N rather than L for length because of confusion with Lower)
Changed `LWord` to `LowWord` etc.
================
Comment at: clangd/FuzzyMatch.h:50
+
+ int NPat, NWord;
+ char Pat[MaxPat];
----------------
klimek wrote:
> I'd use a StringRef instead, and call the storage *Storage or something.
What's the goal here?
I have a couple of objections to this:
- if you actually use StringRef[] to access the data, now you've got a gratuitous indirection everywhere
- For `LPat`/`LWord` too? Now we have *more* members than in the first place, and two ways to write each bounds check.
If the intent is to clean up the places where I construct `StringRef(Word, NWord)` explicitly, adding `StringRef word()` would certainly make sense.
https://reviews.llvm.org/D40060
More information about the cfe-commits
mailing list