[PATCH] D50337: [clangd] DexIndex implementation prototype

Eric Liu via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Aug 17 06:27:48 PDT 2018


ioeric added inline comments.


================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.cpp:86
+  const auto TrigramTokens = generateIdentifierTrigrams(Req.Query);
+  TopLevelChildren.push_back(createAnd(createTrigramIterators(TrigramTokens)));
+
----------------
`createTrigramIterators` and `createScopeIterators` use `InvertedIndex`, so they should be called in the critical section. 

Maybe 

```
/// ....
/// Requires: Called from a critical section of `InvertedIndex`.
std::vector<std::unique_ptr<Iterator>> createTrigramIterators(
    const llvm::DenseMap<Token, PostingList> &InvertedIndex, TrigramTokens);
```


================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.cpp:102
+    // final score might not be retrieved otherwise.
+    const unsigned ItemsToRetrieve = 100 * Req.MaxCandidateCount;
+    SymbolDocIDs = consume(*QueryIterator, ItemsToRetrieve);
----------------
Maybe add a `FIXME` about finding the best pre-scoring retrieval threshold. I'm not sure if `100*MaxCandidateCount` would work well in practice. For example, if the `MaxCandidateCount` is small e.g. `5`, then the retrieval threshold would only be 50, which still seems to be too small.


================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.cpp:110
+      const auto Score = Filter.match(Sym->Name);
+      assert(Score.hasValue() && "Matched symbol has all Fuzzy Matching "
+                                 "trigrams, it should match the query");
----------------
I don't think this assertion is well justified. I think we should just skip if the fuzzy match fails.


================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.cpp:118
+    // in the order of descending score.
+    std::sort(begin(Scores), end(Scores),
+              [](const std::pair<float, const Symbol *> &LHS,
----------------
I think we should use a priority queue so that we don't need to store/sort all retrieved symbols.


================
Comment at: clang-tools-extra/unittests/clangd/DexIndexTests.cpp:367
+// FIXME(kbobyrev): Use 3+ symbols long names so that trigram index is used.
+TEST(MergeDexIndex, Lookup) {
+  DexIndex I, J;
----------------
This seems to be testing `mergeIndex(...)` which is not relevant in this patch? 


================
Comment at: clang-tools-extra/unittests/clangd/DexIndexTests.cpp:388
+
+// FIXME(kbobyrev): Add more tests on DexIndex? Right now, it's mostly a wrapper
+// around DexIndex, adopting tests from IndexTests.cpp sounds reasonable.
----------------
Now that we are handling both short and long queries. I think we can address this FIXME in this patch?


================
Comment at: clang-tools-extra/unittests/clangd/DexIndexTests.cpp:540
+
+TEST(DexMergeIndexTest, Lookup) {
+  DexIndex I, J;
----------------
Again, `mergeIndex(...)` is not interesting here.


https://reviews.llvm.org/D50337





More information about the cfe-commits mailing list