[PATCH] D113995: [clangd] Dex Trigrams: Improve query trigram generation
Sam McCall via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Sun Nov 28 15:08:26 PST 2021
sammccall added inline comments.
================
Comment at: clang-tools-extra/clangd/index/dex/Trigram.cpp:71
}
- // Emit short-query trigrams: FooBar -> f, fo, fb.
- if (!LowercaseIdentifier.empty())
- Out(Trigram(LowercaseIdentifier[0]));
- if (LowercaseIdentifier.size() >= 2)
- Out(Trigram(LowercaseIdentifier[0], LowercaseIdentifier[1]));
- for (size_t I = 1; I < LowercaseIdentifier.size(); ++I)
- if (Roles[I] == Head) {
- Out(Trigram(LowercaseIdentifier[0], LowercaseIdentifier[I]));
+ // Short queries semantics are different: emit incomplete trigrams for them.
+ //
----------------
It'd be good to describe what the semantics actually are somewhere.
As it is, this comment describes the algorithm used to generate the trigrams, and the other function refers to this one.
================
Comment at: clang-tools-extra/clangd/index/dex/Trigram.cpp:74
+ // First, the first separator if it is preceding the alphanumeric symbols.
+ size_t FirstSeparator = std::numeric_limits<size_t>::max();
+ for (size_t I = 0; I < LowercaseIdentifier.size(); ++I) {
----------------
This seems overly complicated.
If a separator precedes the alphanums, then it's at the front.
Moreover its Next table seems to be set up correctly.
So I believe this all amounts to: if the first char is a separator, then we allow starting our partial trigrams at 0, first head, second head. Else just first head & second head. And we just follow the next table.
================
Comment at: clang-tools-extra/clangd/index/dex/Trigram.cpp:87
+ //
+ // - First head
+ // - Fist head + subsequent tail
----------------
This looks a lot like the Next table. Can't we use that? (Second head + third head is missing, but I don't see why)
================
Comment at: clang-tools-extra/clangd/index/dex/Trigram.cpp:147
+
+ unsigned ValidSymbols =
+ llvm::count_if(Roles, [](CharRole R) { return R == Head || R == Tail; });
----------------
both "valid" and "symbols" are confusing here.
Do we actually need this variable?
Seems like we can detect the case at the end, if we generated no trigrams.
================
Comment at: clang-tools-extra/clangd/index/dex/Trigram.cpp:152
+ if (ValidSymbols < 3) {
+ // If a bigram can't be formed, we might prepend a separator.
+ std::string Letters = Roles.front() == Separator && ValidSymbols < 2
----------------
I'd suggest always including the separator, and then using StringRef(Letters).take_back(2). Then we don't need ValidSymbols here either.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D113995/new/
https://reviews.llvm.org/D113995
More information about the cfe-commits
mailing list