[PATCH] D51481: [clangd] Implement proximity path boosting for Dex

Eric Liu via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Aug 31 06:47:53 PDT 2018


ioeric added inline comments.


================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.cpp:137
+          BoostingIterators.push_back(
+              createBoost(create(It->second), P.second + 10));
+      }
----------------
Could you comment on `P.second + 10` here? It sounds like a lot of boost.


================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.cpp:157
+    const size_t ItemsToRetrieve =
+        getRetrievalItemsMultiplier() * Req.MaxCandidateCount;
     auto Root = createLimit(move(QueryIterator), ItemsToRetrieve);
----------------
Again, the multiplier change seems irrelevant in this patch.


================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.cpp:163
+    // Sort items using boosting score as the key.
+    std::sort(begin(SymbolDocIDs), end(SymbolDocIDs),
+              [](const std::pair<DocID, float> &LHS,
----------------
Shouldn't we sort them by `Quality * boost`?


================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.h:64
 
-private:
+  /// For fuzzyFind() Dex retrieves getRetrievalItemsMultiplier() more items
+  /// than requested via Req.MaxCandidateCount in the first stage of filtering.
----------------
kbobyrev wrote:
> ioeric wrote:
> > Why are values of multipliers interesting to users? Could these be implementation details in the cpp file?
> Actually, my understanding is that users might want to have full access to the multipliers at some point to control the performance/quality ratio.
> 
> And it's also useful for the tests: otherwise the last one would have to hard-code number of generated symbols to ensure only boosted ones are in the returned list. It would have to be updated each time these internal multipliers are and we might update them often/make logic less clear (by allowing users to control these parameters).
> 
> What do you think?
I'm not sure if users can usefully control this multipliers without understanding the whole implementation.  Do we have a use case for these APIs now? If not, I'd suggesting removing them from this patch. It also seems to be out of the scope of this patch.


================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.h:73
+private:
+private:
   mutable std::mutex Mutex;
----------------
Double `private`?


================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.h:87
+
+  std::vector<std::string> URISchemes /*GUARDED BY(Mutex)*/;
+
----------------
I think `URISchemes` should be initialized when creating a `DexIndex` and remain the same. So this could be `const` without mutex guard.


================
Comment at: clang-tools-extra/clangd/index/dex/Token.h:54
     Scope,
+    /// Path to symbol declaration.
+    ///
----------------
As this is called `Path`, I'd try to decouple it from URIs, so that we don't need special handling of `scheme` etc in the implementation. We might want to canonicalize URI to "path" like `/file:/path/to/something/` so that we could simply treat it as normal paths, and the token could potentially handle actual actual file paths.


================
Comment at: clang-tools-extra/clangd/index/dex/Token.h:97
 
+/// Returns Search Token for each parent directory of given Path. Should be used
+/// within the index build process.
----------------
I couldn't find implementations of these two functions in this patch?


https://reviews.llvm.org/D51481





More information about the cfe-commits mailing list