[PATCH] D47183: [clangd] Support multiple sema code completion callbacks.

Tue May 22 02:30:52 PDT 2018

ilya-biryukov added inline comments.

================
Comment at: clangd/CodeComplete.cpp:457
       Result.StartsNestedNameSpecifier = false;
+      // FIXME: the same result can be added multiple times as the callback can
+      // be called more than once. We may want to deduplicate identical results.
----------------
ioeric wrote:
> ilya-biryukov wrote:
> > Could we fix this in the first attempt?
> > All the code around completion is really not prepared for multiple callbacks: index will be queried multiple times, we will store and log only the last CCContext, not all of them, etc.
> > I'd argue we should aim to either provide a single-callback completion or rewrite the whole code around completion to properly handle multiple callbacks (i.e. with deduplication and proper merging of the results coming from multiple callbacks, proper logging, no multiple identical requests to the index).
> > 
> > I would suggest the following measures as a hacky intermediate solution:
> > - Ignore natural language completion. The rationale: VSCode does analogous completion on empty results anyway, AFAIK clang does not provide any useful results on top of that. Other clients that we have can (and should?) probably do the same.
> > - Only record the first non-natural language completion attempt. Ignore the rest and log the failed attempts.
> > index will be queried multiple times
> Patterns of multi-context callbacks I've seen are:
> 1) natural language + Name: this happens mostly when parsing macros with stringification.
> 2) name + name: this happens in the pre-existing unit test case. I don't really understand why and how often this comes up, but I think the duplication should be eliminated.
> 3) language+recovery: haven't looked into what the recovery context does.
> > we will store and log only the last CCContext
> What's the concern about storing only the last context?
> 
> > I'd argue we should aim to either provide a single-callback completion or rewrite the whole code around completion to properly handle multiple callbacks (
> I'm not sure how we could (fully) get away with one callback without significantly changing sema parsing. This seems to be an expensive approach though.
> >with deduplication and proper merging of the results coming from multiple callbacks
> I agree. My impression is that multiple callbacks are not common and thus not as important (duplicates are better than no results IMO). But I might be wrong thinking this is uncommon. I was going to do this in a followup patch to avoid a big patch, but I'm happy to do the deduplication in the same patch if you prefer.
> 
> > proper logging
> Could you point out what logging is missing?
> > no multiple identical requests to the index
> For context combinations I've seen (natural language + name, natural language + recovery), index is still queries once. If sema does decide to call name multiple times with context that would potentially yield two index queries, we could still need to query indexes twice (don't see a big problems doing this if not a common case). For identical context that is called multiple times, we could cache potentially results. 
>  
> > I would suggest the following measures as a hacky intermediate solution:
> I think natural language is only one of the contexts that could result in multiple callbacks, so I don't think this would fully resolve our problems. 
> 
> What's the concern about storing only the last context?
If we store just one context, the code might look as if there was only one callback. I was trying to make a point that 1) storing a list of contexts or 2) not storing the context at all, are the two options that seem to be more suitable for multi-callback case.

> I'm not sure how we could (fully) get away with one callback without significantly changing sema parsing. This seems to be an expensive approach though.
Yeah, there's certainly no way we can change sema parsing.
What we could do, though, is to not call code completion during tentative parsing. This shouldn't be too hard and that's certainly the only case that can give new interesting results in practice, i.e.  doing natural language/recovery twice will certainly not change the results.

> Could you point out what logging is missing?
Signalling that there were multiple completion callbacks, showing context for each of those, etc. We seem to log individual callbacks currently, but a small summary of how many callbacks were called would be nice too.

>> no multiple identical requests to the index
> For context combinations I've seen (natural language + name, natural language + recovery), index is still queries once. If sema does decide to call name multiple times with context that would potentially yield two index queries, we could still need to query indexes twice (don't see a big problems doing this if not a common case). For identical context that is called multiple times, we could cache potentially results.

IIUC, multiple callbacks can also happen because of the tentative parsing. It means we could easily get lots of callbacks on ambiguous C++ grammar constructs.
We just need to make sure we don't do identical calls to the index in those cases. Caching index requests in the ongoing completion should definitely do it.

>> I would suggest the following measures as a hacky intermediate solution:
> I think natural language is only one of the contexts that could result in multiple callbacks, so I don't think this would fully resolve our problems.
>From my observations, all sema (i.e. non-recovery/natural lang) contexts provide mostly similar results or don't trigger together, i.e. we won't ever get non-member completion after member completions.
In theory, current completion API **can** provide results that we're gonna miss if we ignore other contexts.
In practice, I bet we would be fine.

That being said, merging of completion results is also an option that seems good. Albeit, I think that's something clang should handle in its code completion internally and clients shouldn't care about.

Repository:
  rCTE Clang Tools Extra

https://reviews.llvm.org/D47183