[PATCH] D51481: [clangd] Implement proximity path boosting for Dex

Eric Liu via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Sep 4 07:54:04 PDT 2018


ioeric added a comment.

There are a few changes/refactorings which I would suggest splitting from this patch, as they would require more thoughts and seem unrelated in this patch. Please see the inline comments :)



================
Comment at: clang-tools-extra/clangd/Quality.h:130
+/// directories URIs.
+float parentProximityScore(unsigned Distance);
+
----------------
This should take in a `FileDistanceOptions` to allow customized up/down costs. To support general cases, this should probably take `UpDistance`, `DownDistance`, and `FileDistanceOptions`.


================
Comment at: clang-tools-extra/clangd/Quality.h:130
+/// directories URIs.
+float parentProximityScore(unsigned Distance);
+
----------------
ioeric wrote:
> This should take in a `FileDistanceOptions` to allow customized up/down costs. To support general cases, this should probably take `UpDistance`, `DownDistance`, and `FileDistanceOptions`.
Not sure if this is the right name. Two paths don't need to be parent and child (if we also consider down cost).  Maybe just `proximityScore`?


================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.cpp:42
 
+struct SymbolAndScore {
+  const Symbol *Sym;
----------------
Maybe `ScoredSymbol`? Is this used outside of `buildIndex`? If not, maybe just inline it? Or simply use a `std::pair`?


================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.cpp:59
     LookupTable[Sym->ID] = Sym;
-    SymbolQuality[Sym] = quality(*Sym);
+    SymbolAndScores[I] = {Sym, static_cast<float>(quality(*Sym))};
   }
----------------
We should fix `quality` to return `float` for consistency. 


================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.cpp:59
     LookupTable[Sym->ID] = Sym;
-    SymbolQuality[Sym] = quality(*Sym);
+    SymbolAndScores[I] = {Sym, static_cast<float>(quality(*Sym))};
   }
----------------
ioeric wrote:
> We should fix `quality` to return `float` for consistency. 
nit: Instead of reconstruct the structure, I'd probably set the fields manually.


================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.cpp:65
+  std::sort(begin(SymbolAndScores), end(SymbolAndScores),
+            std::greater<SymbolAndScore>());
+
----------------
Is the `operator>` overload used anywhere besides here? If not, I'd suggest inlining the overload.


================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.cpp:129
+    const auto ProximityURIs =
+        generateQueryProximityURIs(ProximityPath, URISchemes);
+    for (size_t Distance = 0; Distance < ProximityURIs.size(); ++Distance) {
----------------
IIUC, the assumption here is that `generateQueryProximityURIs` returns proximity tokens of parent directories sorted by distance. This seems to be making assumption about the implementation. `generateQueryProximityURIs` should return the distance explicitly along with the tokens.


================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.cpp:176
+  std::sort(begin(IDAndScores), end(IDAndScores),
+            std::greater<DocIDAndScore>());
 
----------------
Just inline the comparator for clarity.


================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.cpp:179
   // Retrieve top Req.MaxCandidateCount items.
-  std::priority_queue<std::pair<float, const Symbol *>> Top;
-  for (const auto &P : SymbolDocIDs) {
-    const DocID SymbolDocID = P.first;
+  const size_t ItemsToScore = 10 * Req.MaxCandidateCount;
+  TopN<DocIDAndScore> Top(Req.MaxCandidateCount);
----------------
This is adding another staging of filtering. It seems to be an optimization that's not specific to the proximity path change and would probably require more careful review. Can we split this out of this patch for now?


================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.h:80
 
+  /// Stores symbols in the descending order using symbol quality as the key.
   std::vector<const Symbol *> Symbols;
----------------
nit: `Sorted in the descending order of symbol quality.`


================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.h:83
+  /// Maps DocID to the quality of underlying symbol.
+  std::vector<float> SymbolQuality;
   llvm::DenseMap<SymbolID, const Symbol *> LookupTable;
----------------
nit: `SymbolQuality[I] is the quality of Symbols[I]. `


================
Comment at: clang-tools-extra/clangd/index/dex/Iterator.h:139
 /// to acquire preliminary scores of requested items.
-std::vector<std::pair<DocID, float>> consume(Iterator &It);
+std::vector<DocIDAndScore> consume(Iterator &It);
 
----------------
I'm not sure about this refactoring. It seems to be out of the scope. Can we split it out and still return pair for now?


================
Comment at: clang-tools-extra/clangd/index/dex/Token.cpp:36
+  size_t Limit = 5;
+  while (!Path.empty() && Limit--) {
+    // FIXME(kbobyrev): Parsing and encoding path to URIs is not necessary and
----------------
ioeric wrote:
> For the original URI, we could simply add it to the result outside of the loop (and `--Limit`) to save one iteration?
nit: `(--Limit > 0)` for clarity.


================
Comment at: clang-tools-extra/clangd/index/dex/Token.h:100
+std::vector<Token>
+generateQueryProximityURIs(llvm::StringRef AbsolutePath,
+                           llvm::ArrayRef<std::string> URISchemes);
----------------
As mentioned in another comment, this should return distance along with tokens (instead of assuming consecutive parents and distances).


https://reviews.llvm.org/D51481





More information about the cfe-commits mailing list