[PATCH] D44882: [clangd] Implementation of workspace/symbol request

Sam McCall via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Mar 26 09:48:44 PDT 2018


sammccall added a comment.

Very nice! I'd like to reduce the scope of the initial patch, which seems to be possible, so we can review the details but not get bogged down too much.

In https://reviews.llvm.org/D44882#1048179, @malaperle wrote:

> In https://reviews.llvm.org/D44882#1048043, @ilya-biryukov wrote:
>
> > - Unconditionally storing much more symbols in the index might have subtle performance implications, especially for our internal use (the codebase is huge).  I bet that internally we wouldn't want to store the symbols not needed for completion, so we'll probably need a switch to disable storing them in the indexing implementation. But let's wait for Sam to take a look, he certainly has a better perspective on the issues there.
>
>
> I'm a bit surprised that internally you do not want symbols beyond the ones for completion.


FWIW, I think we do, it just needs to be done carefully (with the right semantic filters at query time, and metadata at indexing time).
Can we split this patch up (it's pretty large anyway) and land the `workspaceSymbols` implementation first, without changing the indexer behavior?

I think the indexer changes are in the right direction, but I think "ForCompletion" isn't quite enough metadata, and don't want to block everything on that.

> We have a lot more features in mind that will make the index much bigger, like adding all occurrences (maybe not in the current YAML though).

(tangential)
FWIW, I think *that* might be a case where we want quite a different API - I think this is a file-major index operation (handful of results per file) rather than a symbol-major one (results per symbol is essentially unbounded, e.g. `std::string`). At the scale of google's internal codebase, this means we need a different backend, and this may be the case for large open-source codebases too (I know chromium hits scaling issues with indexers they try). 
There are also other operations where results are files rather than symbols: which files include file x, etc.

> But having options to control the amount of information indexed sounds good to me as there can be a lot of different project sizes and there can be different tradeoffs. I had in mind to an option for including/excluding the occurrences because without them, you lose workspace/references, call hierarchy, etc, but you still have code completion and workspace/symbol, document/symbol, etc while making the index much smaller. But more options sounds good.

If we add options, someone has to actually set them.
Anything that's too big for google's internal index or similar can be filtered out in the wrapping code (mapreduce) if the indexer exposes enough information to do so.
We shouldn't have problems with the in-memory dynamic index.
Features that are off in the default static indexer configuration won't end up getting used much.

So I can see a use for filters where the command-line indexer (YAML) would run too slow otherwise, but a) let's cross that bridge when we come to it and b) there's lots of low-hanging fruit there - the YAML format itself should be considered a placeholder.

>> - Current `fuzzyFind` implementation can only match qualifiers strictly (i.e. st::vector won't match std::vector). Should we look into allowing fuzzy matches for this feature?  (not in this patch, rather in the long term).
> 
> I'm not sure, I'm thinking there should be a balance between how fuzzy it it and how much noise it generates. Right now I find it a bit too fuzzy since when I type "Draft" (to find DraftMgr), it picks up things like DocumentRangeFormattingParams. Adding fuzziness to the namespace would make this worse. Maybe with improved scoring it won't matter too much? I'll try FuzzyMatcher and see.

+1, I think experience with `workspaceSymbols` will help us answer this question.



================
Comment at: clangd/ClangdLSPServer.cpp:99
       Params.capabilities.textDocument.completion.completionItem.snippetSupport;
+  if (Params.capabilities.workspace && Params.capabilities.workspace->symbol &&
+      Params.capabilities.workspace->symbol->symbolKind)
----------------
This is the analogue to the one on `completion` that we currently ignore, and one on `symbol` corresponding to the `documentSymbol` call which isn't implemented yet.

I think the value of the extended types is low and it might be worth leaving out of the first version to keep complexity down.
(If we really want it, the supporting code (mapping tables) should probably be in a place where it can be reused by `documentSymbol` and can live next to the equivalent for `completion`...)


================
Comment at: clangd/ClangdLSPServer.h:95
   bool IsDone = false;
+  /// Indicates whether or not the index is available.
+  bool SymbolIndexAvailable = false;
----------------
This comment is just the var name, delete?


================
Comment at: clangd/ClangdLSPServer.h:96
+  /// Indicates whether or not the index is available.
+  bool SymbolIndexAvailable = false;
 
----------------
actually, I'm not sure this is worth tracking at all.
Dynamic index is on by default, and the worst that can happen is some dynamically-configured clients will show a menu option in a non-default configuration, but it won't actually work.
I don't think fixing this is worth the layering violation


================
Comment at: clangd/ClangdServer.h:161
 
+  void onWorkspaceSymbol(
+      StringRef Query, const clangd::WorkspaceSymbolOptions &Opts,
----------------
onWorkspaceSymbol -> workspaceSymbols


================
Comment at: clangd/WorkspaceSymbols.cpp:1
+//===--- WorkspaceSymbols.cpp ------------------------------------*- C++-*-===//
+//
----------------
as noted elsewhere, adjustKindToCapability etc need to be sharable, so I'm not sure this file structure quite works - it's named too narrowly to have any shared functionality, but if you pull those functions out there's only one left.

Maybe we could rename this FindSymbols.cpp, and then documentSymbols will also live here?


================
Comment at: clangd/WorkspaceSymbols.cpp:165
+            [](const SymbolInformation &A, const SymbolInformation &B) {
+              return A.name < B.name;
+            });
----------------
malaperle wrote:
> ilya-biryukov wrote:
> > We have `FuzzyMatcher`, it can produce decent match scores and is already used in completion.
> > Any reason not to use it here?
> I think it was giving odd ordering but let me try this again and at least document the reason.
I think it's going to be worth pulling out a scoring concept/function here, we can't reuse the priority from the sema codecompletion, but the server's going to provide useful scoring signals.
I'm not sure how much we want to keep this in sync with code completion overall, though.


================
Comment at: clangd/WorkspaceSymbols.h:37
+                    const SymbolIndex *const Index,
+                    llvm::IntrusiveRefCntPtr<vfs::FileSystem> VFS);
+
----------------
so reading all the files we return from the index is obviously going to be bad, we screwed up the interfaces :-)

>From discussing with people who've dealt with similar things, we're probably going to need to store both row/col and offset in the index. I'll try to fix this soon...
What you're doing for now is fine, but should probably have loud fixmes on it.


================
Comment at: clangd/index/Index.h:143
+  // Whether or not the symbol should be considered for completion.
+  bool ForCompletion = false;
+
----------------
This strikes me as conflating a bunch of ideas that are independently important (e.g. is the symbol top-level, is it visible outside the declaring file). We should design this carefully after considering LSP features that might use the index.


================
Comment at: clangd/tool/ClangdMain.cpp:105
 
+static llvm::cl::opt<int> LimitWorkspaceSymbolResult(
+    "workspace-symbol-limit",
----------------
the -completion-limit was mostly to control rollout, I'm not sure this needs to be a flag. If it does, can we make it the same flag as completions (and call it -limit or so?)


================
Comment at: clangd/tool/ClangdMain.cpp:129
+static llvm::cl::opt<bool> EnableIndexBasedFeatures(
+    "enable-index-based-features",
+    llvm::cl::desc("Enable index-based features such as global code completion "
----------------
if we're renaming this, just -index?


Repository:
  rCTE Clang Tools Extra

https://reviews.llvm.org/D44882





More information about the cfe-commits mailing list