[PATCH] D51481: [clangd] Implement proximity path boosting for Dex
Eric Liu via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Mon Sep 3 01:38:29 PDT 2018
ioeric added inline comments.
================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.cpp:36
+ Result.push_back(PathToken);
+ }
return Result;
----------------
nit: no need for braces. [llvm-style]
================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.cpp:126
+ std::vector<std::unique_ptr<Iterator>> BoostingIterators;
+ // This should generate proximity path boosting iterators for each different
+ // URI Scheme the Index is aware of.
----------------
I thought we wanted to take the first scheme that works, given that `URISchemes` is sorted by preference?
================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.cpp:137
+ BoostingIterators.push_back(
+ createBoost(create(It->second), PARENTS_TO_BOOST + 1 - P.second));
+ }
----------------
`PARENTS_TO_BOOST + 1 - P.second` seems to be too much boosting. We should align the boosting function with the one we use in clangd/Quality.cpp, e.g. proximity score should be would be in the range [0, 1], and we can boost by `1+proximity`.
================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.cpp:161
+ // Sort items using boosting score as the key.
+ // FIXME(kbobyrev): Consider merging this stage with the next one? This is
----------------
nit: in general, this should be a combination of relevance score (e.g. proximity) and quality score (e.g. #references).
================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.cpp:169
+ const std::pair<DocID, float> &RHS) {
+ return SymbolQuality.find((*Symbols)[LHS.first])->second *
+ LHS.second >
----------------
nit: `SymbolQuality[(*Symbols)[LHS.first]]`?
For performance reason, could we instead make `SymbolQuality` a vector indexed by `DocID`?
================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.h:42
public:
+ DexIndex(llvm::ArrayRef<std::string> URISchemes) : URISchemes(URISchemes) {}
+
----------------
nit: explicit?
================
Comment at: clang-tools-extra/clangd/index/dex/DexIndex.h:42
public:
+ DexIndex(llvm::ArrayRef<std::string> URISchemes) : URISchemes(URISchemes) {}
+
----------------
ioeric wrote:
> nit: explicit?
Add documentation about `URISchemes`?
================
Comment at: clang-tools-extra/clangd/index/dex/Token.cpp:18
+
+std::vector<Token> generateProximityPaths(llvm::StringRef URIPath,
+ size_t Limit) {
----------------
Add `FIXME` mentioning that the parsing and encoding here are not necessary and could potentially be optimized.
================
Comment at: clang-tools-extra/clangd/index/dex/Token.cpp:28
+ "Non-empty argument of generateProximityPaths() should be a valid URI.");
+ const auto Scheme = ParsedURI.get().scheme();
+ const auto Authority = ParsedURI.get().authority();
----------------
nit: `ParsedURI->scheme();`
================
Comment at: clang-tools-extra/clangd/index/dex/Token.cpp:32
+ // Get parent directory of the file with symbol declaration.
+ Path = llvm::sys::path::parent_path(Path);
+ while (!Path.empty() && Limit--) {
----------------
I think you need to set `posix` style explicitly here. On windows, the separator is '/'.
================
Comment at: clang-tools-extra/clangd/index/dex/Token.cpp:32
+ // Get parent directory of the file with symbol declaration.
+ Path = llvm::sys::path::parent_path(Path);
+ while (!Path.empty() && Limit--) {
----------------
ioeric wrote:
> I think you need to set `posix` style explicitly here. On windows, the separator is '/'.
I think the original path should also be used as a proximity path.
================
Comment at: clang-tools-extra/clangd/index/dex/Token.cpp:51
+ // should be just skipped.
+ if (!static_cast<bool>(URI)) {
+ // Ignore the error.
----------------
You could use `llvm::consumeError(URI.takeError())`.
================
Comment at: clang-tools-extra/clangd/index/dex/Token.h:45
+ // FIXME(kbobyrev): Storing Data hash would be more efficient than storing raw
+ // strings.
enum class Kind {
----------------
nit: For example, .... (URI is an good example I think)
================
Comment at: clang-tools-extra/clangd/index/dex/Token.h:58
+ ///
+ /// Data stores URI the parent directory of symbol declaration location.
+ ///
----------------
Do we also consider the original URI as a proximity path?
================
Comment at: clang-tools-extra/clangd/index/dex/Token.h:94
+///
+/// When Limit is set, returns no more than Limit tokens.
+std::vector<Token>
----------------
nit: also explain how Limit is used? e.g. closer parent directories are preferred?
================
Comment at: clang-tools-extra/clangd/index/dex/Token.h:96
+std::vector<Token>
+generateProximityPaths(llvm::StringRef Path,
+ size_t Limit = std::numeric_limits<size_t>::max());
----------------
As we are assuming that `Path` is an URI. We should be more explicit in the function names.
================
Comment at: clang-tools-extra/clangd/index/dex/Token.h:104
+ llvm::ArrayRef<std::string> URISchemes,
+ size_t Count = 5);
+
----------------
I'm wondering whether we should make `Count` customizable or keep it as implementation detail. `generateProximityPaths` and `generateQueryProximityPaths` should have the same Limit/Count in practice, and I'm not sure if users could usefully customize the value.
https://reviews.llvm.org/D51481
More information about the cfe-commits
mailing list