[PATCH] D113995: [clangd] Dex Trigrams: Improve query trigram generation

Sam McCall via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Sun Nov 28 15:08:26 PST 2021


sammccall added inline comments.


================
Comment at: clang-tools-extra/clangd/index/dex/Trigram.cpp:71
   }
-  // Emit short-query trigrams: FooBar -> f, fo, fb.
-  if (!LowercaseIdentifier.empty())
-    Out(Trigram(LowercaseIdentifier[0]));
-  if (LowercaseIdentifier.size() >= 2)
-    Out(Trigram(LowercaseIdentifier[0], LowercaseIdentifier[1]));
-  for (size_t I = 1; I < LowercaseIdentifier.size(); ++I)
-    if (Roles[I] == Head) {
-      Out(Trigram(LowercaseIdentifier[0], LowercaseIdentifier[I]));
+  // Short queries semantics are different: emit incomplete trigrams for them.
+  //
----------------
It'd be good to describe what the semantics actually are somewhere.
As it is, this comment describes the algorithm used to generate the trigrams, and the other function refers to this one.


================
Comment at: clang-tools-extra/clangd/index/dex/Trigram.cpp:74
+  // First, the first separator if it is preceding the alphanumeric symbols.
+  size_t FirstSeparator = std::numeric_limits<size_t>::max();
+  for (size_t I = 0; I < LowercaseIdentifier.size(); ++I) {
----------------
This seems overly complicated.
If a separator precedes the alphanums, then it's at the front.
Moreover its Next table seems to be set up correctly.

So I believe this all amounts to: if the first char is a separator, then we allow starting our partial trigrams at 0, first head, second head. Else just first head & second head. And we just follow the next table.


================
Comment at: clang-tools-extra/clangd/index/dex/Trigram.cpp:87
+  //
+  // - First head
+  // - Fist head + subsequent tail
----------------
This looks a lot like the Next table. Can't we use that? (Second head + third head is missing, but I don't see why)


================
Comment at: clang-tools-extra/clangd/index/dex/Trigram.cpp:147
+
+  unsigned ValidSymbols =
+      llvm::count_if(Roles, [](CharRole R) { return R == Head || R == Tail; });
----------------
both "valid" and "symbols" are confusing here.

Do we actually need this variable?
Seems like we can detect the case at the end, if we generated no trigrams.


================
Comment at: clang-tools-extra/clangd/index/dex/Trigram.cpp:152
+  if (ValidSymbols < 3) {
+    // If a bigram can't be formed, we might prepend a separator.
+    std::string Letters = Roles.front() == Separator && ValidSymbols < 2
----------------
I'd suggest always including the separator, and then using StringRef(Letters).take_back(2). Then we don't need ValidSymbols here either.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113995/new/

https://reviews.llvm.org/D113995



More information about the cfe-commits mailing list