[PATCH] D50517: [clangd] Generate incomplete trigrams for the Dex index

Eric Liu via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Aug 10 10:28:41 PDT 2018


ioeric added inline comments.


================
Comment at: clang-tools-extra/clangd/index/dex/Trigram.cpp:31
+/// use it as the special symbol.
+const auto END_MARKER = '$';
+
----------------
nit: s/auto/char/

Maybe just use `static` instead of an anonymous namespace just for this.


================
Comment at: clang-tools-extra/clangd/index/dex/Trigram.cpp:83
+
+  bool FoundFirstHead = false;
+  // Iterate through valid seqneces of three characters Fuzzy Matcher can
----------------
It's probably unclear what `FoundFirstHead` and `FoundSecondHead` are for to readers without context (i.e. we are looking first two HEADs). I think it's would be cleaner if we handle this out side of the look e.g. record the first head in the `Next` initialization loop above.


================
Comment at: clang-tools-extra/clangd/index/dex/Trigram.cpp:138
   DenseSet<Token> UniqueTrigrams;
   std::deque<char> Chars;
 
----------------
nit: `Chars` is only used in the else branch?


================
Comment at: clang-tools-extra/clangd/index/dex/Trigram.cpp:143
+  if (ValidSymbolsCount < 3) {
+    std::string Symbols = {{END_MARKER, END_MARKER, END_MARKER}};
+      if (LowercaseQuery.size() > 0)
----------------
I think this can be simplified as:
```
std::string Chars = LowercaseQuery.substr(std::min(3, LowercaseQuery.size()));
Chars.append(END_MARKER, 3-Chars.size());
UniqueTrigrams.insert(Token(Token::Kind::Trigram, Chars));
```

also nit: avoid using the term "symbol" here.


================
Comment at: clang-tools-extra/clangd/index/dex/Trigram.cpp:167
 
-    Chars.push_back(LowercaseQuery[I]);
+        if (Chars.front() == END_MARKER)
+          auto Trigram = Token(Token::Kind::Trigram,
----------------
When would this happen? And why are we reversing the trigram and throwing it away?


================
Comment at: clang-tools-extra/clangd/index/dex/Trigram.h:59
+/// * Bigram with the first two symbols (if availible), same logic applies.
+///
 /// Note: the returned list of trigrams does not have duplicates, if any trigram
----------------
ioeric wrote:
> I think the comment can be simplified a bit:
> ```
> This also generates incomplete trigrams for short query scenarios:
>   * Empty trigram: "$$$"
>   * Unigram: the first character of the identifier.
>   * Bigrams: a 2-char prefix of the identifier and a bigram of the first two HEAD characters (if it exists).
> ```
nit: and the previous paragraph `Special kind of trigrams ...` is redundant now IMO.


https://reviews.llvm.org/D50517





More information about the cfe-commits mailing list