[PATCH] D50517: [clangd] Generate incomplete trigrams for the Dex index
Eric Liu via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Fri Aug 10 10:28:41 PDT 2018
ioeric added inline comments.
================
Comment at: clang-tools-extra/clangd/index/dex/Trigram.cpp:31
+/// use it as the special symbol.
+const auto END_MARKER = '$';
+
----------------
nit: s/auto/char/
Maybe just use `static` instead of an anonymous namespace just for this.
================
Comment at: clang-tools-extra/clangd/index/dex/Trigram.cpp:83
+
+ bool FoundFirstHead = false;
+ // Iterate through valid seqneces of three characters Fuzzy Matcher can
----------------
It's probably unclear what `FoundFirstHead` and `FoundSecondHead` are for to readers without context (i.e. we are looking first two HEADs). I think it's would be cleaner if we handle this out side of the look e.g. record the first head in the `Next` initialization loop above.
================
Comment at: clang-tools-extra/clangd/index/dex/Trigram.cpp:138
DenseSet<Token> UniqueTrigrams;
std::deque<char> Chars;
----------------
nit: `Chars` is only used in the else branch?
================
Comment at: clang-tools-extra/clangd/index/dex/Trigram.cpp:143
+ if (ValidSymbolsCount < 3) {
+ std::string Symbols = {{END_MARKER, END_MARKER, END_MARKER}};
+ if (LowercaseQuery.size() > 0)
----------------
I think this can be simplified as:
```
std::string Chars = LowercaseQuery.substr(std::min(3, LowercaseQuery.size()));
Chars.append(END_MARKER, 3-Chars.size());
UniqueTrigrams.insert(Token(Token::Kind::Trigram, Chars));
```
also nit: avoid using the term "symbol" here.
================
Comment at: clang-tools-extra/clangd/index/dex/Trigram.cpp:167
- Chars.push_back(LowercaseQuery[I]);
+ if (Chars.front() == END_MARKER)
+ auto Trigram = Token(Token::Kind::Trigram,
----------------
When would this happen? And why are we reversing the trigram and throwing it away?
================
Comment at: clang-tools-extra/clangd/index/dex/Trigram.h:59
+/// * Bigram with the first two symbols (if availible), same logic applies.
+///
/// Note: the returned list of trigrams does not have duplicates, if any trigram
----------------
ioeric wrote:
> I think the comment can be simplified a bit:
> ```
> This also generates incomplete trigrams for short query scenarios:
> * Empty trigram: "$$$"
> * Unigram: the first character of the identifier.
> * Bigrams: a 2-char prefix of the identifier and a bigram of the first two HEAD characters (if it exists).
> ```
nit: and the previous paragraph `Special kind of trigrams ...` is redundant now IMO.
https://reviews.llvm.org/D50517
More information about the cfe-commits
mailing list