[PATCH] D50337: [clangd] DexIndex implementation prototype

Eric Liu via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Aug 8 07:01:44 PDT 2018


ioeric added inline comments.


================
Comment at: clang-tools-extra/clangd/index/MemIndex.h:45
   // Index is a set of symbols that are deduplicated by symbol IDs.
-  // FIXME: build smarter index structure.
   llvm::DenseMap<SymbolID, const Symbol *> Index;
----------------
I think this FIXME still applies here. This can probably go away when we completely get rid of MemIndex.


================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.cpp:31
+
+    // Symbols are sorted by symbol qualities so that items in the posting lists
+    // are stored in the descending order of symbol quality.
----------------
Couldn we build the inverted index also outside of the critical section? As this blocks ongoing index requests, we should do as little work as possible in the CS.


================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.cpp:40
+    for (DocID SymbolRank = 0; SymbolRank < Symbols->size(); ++SymbolRank) {
+      const auto Sym = (*Symbols)[SymbolRank];
+      for (const auto &Token : generateSearchTokens(*Sym)) {
----------------
nit: `const auto * Sym`


================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.cpp:57
+  FuzzyMatcher Filter(Req.Query);
+  bool More;
+  {
----------------
nit: Initialize to false


================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.cpp:67
+    // also implemented in that approach.
+    // FIXME(kbobyrev): Code sharing is not very pleasant here, is it better to
+    // dispatch long and short queries to internal helper functions?
----------------
I'm not quite sure what this FIXME means. What code do you want to share between them? 

But we do want to refactor the code a bit to make it easier to follow.


================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.cpp:92
+      // otherwise the iterator won't have any children and will be invalid.
+      if (ScopeIterators.size())
+        TopLevelChildren.push_back(createOr(move(ScopeIterators)));
----------------
Can we avoid creating scope iterators in the first place if they are not going to be used?


================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.cpp:98
+      // Req.MaxCandidateCount. That would require implementing Limit iterator.
+      SymbolDocIDs = consume(*QueryIterator);
+      More = SymbolDocIDs.size() > Req.MaxCandidateCount;
----------------
It seems that the lack of limiting iterators can make the query very inefficient. Maybe we should implement the limiting iterators before getting the index in, in case people start using it before it's ready?


================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.cpp:102
+      // FIXME(kbobyrev): Don't score items twice.
+      for (const auto &Pair : Index) {
+        const auto Sym = Pair.second;
----------------
Could you document what the approach you are taking to handle short queries? It seems that this can be very expensive, for example, if all matching symbols are at the end of the DenseMap. Short queries happen very often, so we need to make sure index handles them efficiently.


================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.cpp:102
+      // FIXME(kbobyrev): Don't score items twice.
+      for (const auto &Pair : Index) {
+        const auto Sym = Pair.second;
----------------
ioeric wrote:
> Could you document what the approach you are taking to handle short queries? It seems that this can be very expensive, for example, if all matching symbols are at the end of the DenseMap. Short queries happen very often, so we need to make sure index handles them efficiently.
Did you mean to iterate on `Symbols` which is sorted by quality?


================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.h:8
+//
+//===----------------------------------------------------------------------===//
+
----------------
I think this file could use some high-level documentation.


================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.h:27
+public:
+  void build(std::shared_ptr<std::vector<const Symbol *>> Symbols);
+
----------------
There is some assumption about `Symbols` (like`MemIndex::build`). Please add documentation.


================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.h:31
+  // out to Index.h?
+  static std::unique_ptr<SymbolIndex> build(SymbolSlab Slab);
+
----------------
Do we need this for this patch? If not, I'd suggest leaving it out and revisit when we actually need it (e.g. when replacing MemIndex); otherwise, I think what we want is basically a function that takes a `SymbolSlab` and returns `std::shared_ptr<std::vector<const Symbol *>>`. This can probably live in Index.h as you suggested.


================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.h:47
+  std::shared_ptr<std::vector<const Symbol *>> Symbols;
+  llvm::DenseMap<SymbolID, const Symbol *> Index;
+  mutable std::mutex Mutex;
----------------
Unlike `MemIndex` where this is used as an actual index, here it's simply a lookup table, IIUC? Maybe just `SymbolsByID` or `lookupTable`?


================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.h:49
+  mutable std::mutex Mutex;
+  llvm::DenseMap<Token, PostingList> InvertedIndex;
+};
----------------
This can use some comments. Could be useful for people who are not familiar with inverted index.


================
Comment at: clang-tools-extra/clangd/index/dex/Token.h:84
 
+std::vector<Token> generateSearchTokens(const Symbol &Sym);
+
----------------
Please document this function.


https://reviews.llvm.org/D50337





More information about the cfe-commits mailing list