[PATCH] D52808: [cland] Dex: fix/simplify trigram generation

Eric Liu via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Oct 4 03:05:39 PDT 2018


ioeric added a comment.

> Generate more short-query trigrams, e.g. "AbcDefGhi" now yields "d" and "ag".

I am concerned about the impact on the size of posting lists (we can measure) and retrieval quality by adding more incomplete trigrams.

> This is effectively required by LSP, having "ag" not match but "agh" match will lead to glitches due to client-side filtering.

It seems hard to make index behave the same as LSP clients, considering all the ranking signals we have; there can be other reasons that cause `having "ag" not match but "agh" match`. And it seems reasonable to assume that you would get better symbols as you type more characters (i.e. stronger relevance signal).

> Drop leading-punctuation short-query trigrams. Nice idea, but current implementation is broken (competes with non-punctuation short query trigrams).

Could you elaborate how this is broken? We should probably fix it instead of removing it. `__` not matching `__some_macro` sounds like a regression.



================
Comment at: unittests/clangd/DexTests.cpp:366
 
+  auto X = generateIdentifierTrigrams("abc_defGhij__klm");
+  llvm::sort(X,
----------------
nit: remove?


Repository:
  rCTE Clang Tools Extra

https://reviews.llvm.org/D52808





More information about the cfe-commits mailing list