[PATCH] D53131: [clangd] Support scope proximity in code completion.

Sam McCall via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Oct 12 04:50:01 PDT 2018


sammccall added a comment.

(BTW I wouldn't expect to see improvements on eval here, as



================
Comment at: clangd/CodeComplete.cpp:558
+    if (const auto *NS = dyn_cast<NamespaceDecl>(Ctx))
+      return NS->getQualifiedNameAsString() + "::";
+  return llvm::None;
----------------
ioeric wrote:
> sammccall wrote:
> > does this do the right thing if it's anonymous or inline, or a parent is?
> > 
> > For indexing, we call a function in AST.h, we should probably do something similar.
> > 
> > The catch is that function relies on NamedDecl::printQualifiedName, maybe we need to factor out the relevant part so we can call it on a DeclContext.
> Good catch.
> 
> `NamedDecl::printQualifiedName` doesn't skip the `(anonymous)` part if the decl itself is anonymous, even with `SuppressUnwrittenScope` set. So I think we should explicitly filter those out here and then call `printQualifiedName`.
can you move this function to AST.h? e.g. `printScope`? It seems like an obvious peer to `printQualifiedName`.


================
Comment at: clangd/Quality.cpp:246
 
+float QueryScopeProximity::proximity(llvm::StringRef Scope) const {
+  if (QueryScopes.empty())
----------------
ioeric wrote:
> sammccall wrote:
> > If you're going to decide these numbers directly...
> > 
> > a) I think you should just return a multiplier here, rather than a 0-1 score
> > b) we should more aggressively downrank non-preferred symbols: currently by only ~1/3 vs symbols from a preferred scopes
> > 
> > e.g. I'd suggest returning 1, 2, 1.5, 1, 1, 1.5, 0.3 or similar
> > a) I think you should just return a multiplier here, rather than a 0-1 score.
> I tried this. Unfortunately, allowing <1 multiplier would make the default value (for sema proximity) tricky. And [0,1] proximity score seems to be less confusing as it's consistent with the file proximity scale. If [0,1] isn't enough, we could still increase the upper bound?
> 
> > b) we should more aggressively downrank non-preferred symbols: currently by only ~1/3 vs symbols from a preferred scopes
> It's ~1/3 only for the global scope vs non-preferred symbols, which seems reasonable to me. I worry penalizing non-preferred scopes too much can make cross-namespace completion less useable.
> Unfortunately, allowing <1 multiplier would make the default value (for sema proximity) tricky. 

There are three types of symbols:
 - those we get from sema only. Here your patch decides to give max boost (I think?) because we don't trust scope comparison for all these cases.
 - those we get from index only. We choose a boost by matching the stated scope to the query scopes.
 - those we get from sema and index. We should *probably* treat these like Sema-only and give max boost.

Note the category of results which we have neither sema nor index for doesn't actually exist. The framework allows for it and we should respect that as best we can, but it can't be the top priority.

So can't we just change the SemaScopeProximityScore to `bool SemaSaysInScope` or something and evaluate as `SemaSaysInScope ? MaxBoost : QueryScopeProximity ? evaluateUsingQueryScopeProximity() : 1`?

> It's ~1/3 only for the global scope vs non-preferred symbols
Queried scopes (not most-preferred, and not the global scope!) get a boost of 1.6, while completely irrelevant scopes get a boost of 1. This is a relative penalty of only 37.5%.
(I think a more reasonable penalty would be in the range of 75% or 80%, i.e. a 3-4x boost for symbols that are in-scope)


================
Comment at: clangd/Quality.h:92
+  private:
+    std::vector<std::string> QueryScopes;
+};
----------------
ioeric wrote:
> sammccall wrote:
> > hmm, can/should we use FileProximity for this, and just transform the strings?
> > 
> > This isn't going to cache for symbols sharing a namespace, isn't going to handle "symbol is in a child namespace of an included namespace", and some other combinations.
> It seems to me that it'd be less trivial to associate scopes with up/down traverses. Currently, the query scopes contain all enclosing namespaces, so the distance seems less important here. In general, I think the characteristics of scope proximity (e.g. all enclosing namespaces are already in query scopes) allow us to get away with something more trivial than file proximity? 
> 
> > This isn't going to cache for symbols sharing a namespace.
> This seems to be less a concern if we are not calculating up/down distance.
> > isn't going to handle "symbol is in a child namespace of an included namespace"
> This can be covered by `SymbolScope.startswith(QueryScope)`? It might not work well for `using-namespace`s, but it's unclear how important that is.
> 
> Still happy to consider FileDistance-approach as I might be missing something.
> Currently, the query scopes contain all enclosing namespaces, so the distance seems less important here.

Sure. But there cases like `using namespace std` and then `_1` which should resolve to `std::placeholders::_1`.
Or being in namespace `clang::clangd` and wanting to use a symbol `clang::tooling::CompilationDatabase`.

> allow us to get away with something more trivial than file proximity?
Maybe we can. But we should only be trying to get away with it if it's some combination of {less complex, better, faster}. It doesn't seem clear to me that it's any of these.

>> This isn't going to cache for symbols sharing a namespace.
>This seems to be less a concern if we are not calculating up/down distance.
Again, less of a concern, but not obviously a non-issue. With this implementation, you compare the scope string of each symbol to every query scope. With FileDistance, you hash the scope string and then almost-always hit the cache.

> This can be covered by SymbolScope.startswith(QueryScope)?
You've updated the code for this, but now you're not giving nested-within-query-scope any penalty compared to directly-in-query-scope. All of this can be implemented, but I don't really understand why - does reusing the impl not seem simpler?


Repository:
  rCTE Clang Tools Extra

https://reviews.llvm.org/D53131





More information about the cfe-commits mailing list