[PATCH] D40060: [clangd] Fuzzy match scorer
Ilya Biryukov via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Wed Nov 29 03:09:01 PST 2017
ilya-biryukov added inline comments.
================
Comment at: clangd/FuzzyMatch.cpp:118
+ 0x00, 0x00, 0x00, 0x00, // Control characters
+ 0xff, 0xff, 0xff, 0xff, // Punctuation
+ 0x55, 0x55, 0xf5, 0xff, // Numbers->Lower, more Punctuation.
----------------
I'm not sure if we care, but maybe we should treat `+`, `-` and other symbols that could be in operator names (e.g., `operator +`) differently for C++.
Could also make sense for other languages with overloaded operators.
================
Comment at: clangd/FuzzyMatch.cpp:244
+ // Forbidden: matching the first pattern character in the middle of a segment.
+ if (!P && WordRole[W] == Tail)
+ return AwfulScore;
----------------
Maybe use `P == 0` instead? It's equivalent, but a bit easier to read if you think of `P` as an offset.
Totally subjective, though, it's fine to have it either way.
================
Comment at: clangd/FuzzyMatch.cpp:245
+ if (!P && WordRole[W] == Tail)
+ return AwfulScore;
+ ScoreT S = 0;
----------------
Does it mean I will get no matches in the following situation?
`Items = [printf, scanf]`
`Pattern = f`
It may be a bit confusing, since I do have a match, even though is terrible and it's ok to put those items very low in the list.
A more real example is:
`Items = [fprintf, fscanf]`
`Pattern = print`
Would `fprintf` match in that case? I think it should.
Another important one:
`Items = [istream, ostream]`
`Pattern = stream`
================
Comment at: clangd/FuzzyMatch.h:31
+public:
+ // Characters beyond MaxPat are ignored.
+ FuzzyMatcher(llvm::StringRef Pattern);
----------------
Maybe move the truncating logic into the clients? The users of this code are certainly better suited to report warnings/reject requests that are too large.
================
Comment at: clangd/FuzzyMatch.h:53
+
+ int PatN, WordN; // Length of pattern and word.
+ char Pat[MaxPat], Word[MaxWord]; // Raw pattern and word data.
----------------
Maybe we could split the data we store into two sections:
1. Pattern-specific data. Initialized on construction, never changed later.
2. Per-match data. Initialized per `match()` call.
Otherwise it is somewhat hard to certify whether everything is being initialized properly.
https://reviews.llvm.org/D40060
More information about the cfe-commits
mailing list