[PATCH] D40548: [clangd] Symbol index interfaces and an in-memory index implementation.
Sam McCall via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Fri Dec 15 02:32:08 PST 2017
sammccall added inline comments.
Comment at: clangd/index/Index.h:134
+ virtual bool
+ fuzzyFind(Context &Ctx, const FuzzyFindRequest &Req,
+ std::function<void(const Symbol &)> Callback) const = 0;
> malaperle wrote:
> > ioeric wrote:
> > > malaperle wrote:
> > > > I think a more generic std::function would be useful, similar to the indexstore's filter
> > > > ```
> > > > bool searchSymbols(llvm::function_ref<bool(IndexRecordSymbol, bool &stop)> filter,
> > > > llvm::function_ref<void(IndexRecordSymbol)> receiver)
> > > > ```
> > > Do you have an use case in mind which requires different filtering? This could probably be a separate interface.
> > >
> > > I think the behavior of fuzzy finding is well-defined, and how filtering is done is an implementation detail. User-specified filter might make implementation less flexible, especially for large indexes that are not managed in memory.
> > For example
> > - Searching by USR
> > - By non-qualified name (useful also for Open Workspace Symbol)
> > - Most likely other filters, hmm, by Symbol kind, language?
> > - No filter, i.e. retrieve *all* symbols. Useful mainly for development and to get index statistics
> > There could also be a boolean in the callback return value (or the filter callback) to limit the number of returned values.
> > I haven't tried locally but it looks like it would be pretty doable to change the FuzzyFindRequest into a std::function.
> > A big disadvantage of having just one method though is that is would be difficult for an implementation to optimize for a specific type of query. For example, if you search by non-qualified name, an implementation could use a mapping of non-qualified-name-to-USR but it would be difficult to know what to do given only the std::function/filter passed by argument.
> > In that sense, perhaps it is better to have multiple methods for each use cases. Or perhaps some enum for each kind of query. What do you think?
> Adding new interfaces is an option. It should also be easy to extend the existing interface to cover more use cases.
> > Searching by USR
> We definitely want an interface for this. I can see this being useful for Goto-definition and other features.
> > By non-qualified name (useful also for Open Workspace Symbol)
> I think `fuzzyFind` can support this, with fuzzy matching optionally turned off if preferred.
> > Most likely other filters, hmm, by Symbol kind, language?
> Similarly, this could be achieved by extending the find request to include more filters e.g. symbol kind.
> > No filter, i.e. retrieve *all* symbols. Useful mainly for development and to get index statistics
> This can be done today by setting an empty fuzzy filter.
As Eric said, we should be able to add more options here.
But why are we reluctant to add an arbitrary std::function, which is more elegant and flexible?
- it essentially forbids remote indexes. Because functions aren't serializable, the only possible implementation is to send all the symbols to clangd. For our main codebase, this is would be ~1G for just qnames, and a lot more for full symbols.
- A slightly more subtle disadvantage is because functions aren't serializable, it's harder to debug them - you can't just log the query.
- filtering functions are easy to implement (so can be distributed over callsites), but scoring functions are hard (so should be more centralized). They tend to operate on the same data. It's not obvious how to achieve this with a filter API.
rCTE Clang Tools Extra
More information about the cfe-commits