[PATCH] D50337: [clangd] DexIndex implementation prototype

Fri Aug 10 03:02:10 PDT 2018

ioeric added inline comments.

================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.cpp:29
+  // might be more efficient.
+  std::sort(begin(*Syms), end(*Syms), [](const Symbol *LHS, const Symbol *RHS) {
+    return quality(*LHS) > quality(*RHS);
----------------
Calculating `quality` on each comparison can also get expensive. I think we could store the quality.

================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.cpp:37
+    for (const auto &Token : generateSearchTokens(*Sym)) {
+      if (!TempInvertedIndex.count(Token))
+        TempInvertedIndex[Token] = {SymbolRank};
----------------
nit: use `try_emplace` to save one lookup? 

================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.cpp:62
+  {
+    std::lock_guard<std::mutex> Lock(Mutex);
+    More = Req.Query.size() >= 3 ? fuzzyFindLongQuery(Callback, Req)
----------------
I think we should let the helpers grab the lock. Some preparatory work doesn't require lock.

FWIW, I think the separation of short and long code paths is heading in a wrong direction. And it's also pretty hard to find a very clean abstraction here. For example, there is some overlaps in both functions, and `useCallback` seems a bit awkward. 

As all this would probably go away after D50517 anyway, I think we could try to get that patch landed and incorporate it into this patch? If you prefer to get this patch in first, please add `FIXME` somewhere to make it clear that the divergence is temporary.

================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.cpp:82
+    const auto *Sym = (*Symbols)[ID];
+    const auto Score = Filter.match(Sym->Name);
+    // Match scopes from the query.
----------------
nit: avoid `auto`here as the type of `Score` is not obvious here.

================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.cpp:84
+    // Match scopes from the query.
+    // FIXME(kbobyrev): Use scope filtering before and iterate through the
+    // posting list with the result. However, since the query might not contain
----------------
Put this `FIXME` on the for-loop level as iterating all symbols is the problem.

And I think the `FIXME` could simply be 
```
FIXME(...): This can be very expensive. We should first filter symbols by scopes (with scope iterators).
```

We can leave out the details/options as they are not interesting for most readers (as they are mostly concerns about scope filtering). In general, we should try to keep comments in documentation brief to keep the code shorter and easier to read.

================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.cpp:94
+      continue;
+    if (Score.hasValue()) {
+      // If the requested number of results is already retrieved but there
----------------
nit: `if (Score)`

================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.cpp:95
+    if (Score.hasValue()) {
+      // If the requested number of results is already retrieved but there
+      // are other symbols which match the criteria, just stop processing
----------------
The code here is trivial, so the comment seems redundant. 

================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.cpp:102
+      }
+      Scores.push_back(std::make_pair(-*Score * quality(*Sym), Sym));
+    }
----------------
For clarity, `- (*Score) * quality`. 

Please also comment on why we negate the number here?

================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.cpp:175
+  // Sort retrieved symbol by Fuzzy Matching score.
+  std::sort(begin(Scores), end(Scores));
+
----------------
Shouldn't `Scores` already be sorted for both short query and long query?

================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.h:12
+// over symbol tokens, such as fuzzy matching trigrams, scopes, types, etc. Dex
+// is supposed to be a drop-in replacement of MemIndex in the future: while
+// consuming more memory and having longer build stage due to preprocessing, Dex
----------------
nit: we don't really need to mention MemIndex here as it's likely to be replaced soon, and the comment will outdated then.

It's okay to mention why `Dex` is cool though :)

================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.h:60
+private:
+  /// Constructs iterators over tokens extracted from the query and exhausts it
+  /// while applying Callback to each symbol in the order of decreasing quality
----------------
nit: The comment about implementation details should go with the implementation. Same below.

================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.h:65
+                          const FuzzyFindRequest &Req) const;
+  /// Handles fuzzy matching requests with Req.Query.size() <= 3. This approach
+  /// currently does not use trigram index and query iterators and will be
----------------
`Req.Query.size() < 3`?

================
Comment at: clang-tools-extra/clangd/index/dex/Token.h:84

+std::vector<Token> generateSearchTokens(const Symbol &Sym);
+
----------------
ioeric wrote:
> Please document this function.
nit: I'd try to avoid tying documentation to the current state as it can easily get outdated. If you want to include future changes, consider making it more explicit. For example:
```
Returns the tokens which are given symbol's characteristics. For example, ... trigrams and scopes.
FIXME: support more tokens types:
- path proxmity
- ...
```

================
Comment at: clang-tools-extra/clangd/index/dex/Token.h:88
+/// types of tokens such as symbol type (if applicable).
+std::vector<Token> generateSearchTokens(const Symbol &Sym);
+
----------------
Dependency on `Index.h` form `Token.h` seems strange. I think this function should probably belong in `DexIndex.cpp`? Do we expect this to be used outside of `DexIndex`?

https://reviews.llvm.org/D50337