[clangd-dev] LSP Cancellations - making it universal and measurable?

Wed Sep 12 06:44:27 PDT 2018

Trying just to reply where I can add something new.

On Wed, Sep 12, 2018, 13:33 Ilya Biryukov <ibiryukov at google.com> wrote:

> It's a good thing we kept from doing intrusive changes to all of the APIs,
> prototyping only for code completion. Hopefully the refactoring won't be
> too painful.
> It's up to you to decide which approach to take at the end. Here's my take
> on trying to convince that the current API make sense
>
> I can see the benefits of the proposed API if we were designing
> cancellation for distributed web services where everything is an RPC and
> there is a clear need to propagate the cancellation information through RPC
> call chains, possibly in different processes or even on different machines.
> It feels clangd is not like that, it's a standalone tool (or a library)
> which at the end that manages its own async tasks. Propagating cancellation
> handles does not seem like a big problem there, everything is in a single
> process and data-flow is mostly simple.
>
> E.g. if what we want is to move the cancellation handling to
> JSONRPCDispatcher, we could easily do it by making the dispatched methods
> return the TaskHandles that we need to store.
> It's a simple refactoring, spanning only one layer.
>
That doesn't sound easy. The practical problem I'm trying to solve is: I
want to make all LSP actions cancelable, to determine whether we're missing
optimizations. As far as I can see, this requires adding somewhat-subtle
boilerplate to ~20 ClangdServer methods, and updating all their out-of-tree
callers.

I doubt the other use-case that we have (aforementioned internal editor
> that embeds clangd) will be much work either, though I may be wrong.
>
> > I think the biggest difference is whether the cancelability of an
> operation is represented in its static type (e.g. function return type). Is
> this what you mean?
> Yep, I meant encoding what we want in the type systems.
>
> > I guess I don't see that as a bad thing?
> My preference for encoding in type systems is that I can easily trace the
> created task back to whoever performs cancellations.
> It feels like this is gonna be slightly trickier in the new approach,
> since not only the consumers, but also the producers of cancellable tasks
> can now be anywhere up the context call chain.
>
> > Cancellation is a cross-cutting concern that you mostly want to ignore,
> except at layers where: a) you can meaningfully abort early to save some
> work, or b) you want to start a chunk of work and maybe cancel it later.
> Agree that the cancellation *checking* cuts though all the layers and
> requiring passing objects through all of the layers is a pain.
> Not so sure about the "producers" of tasks. We can't possibly keep
> functions that start cancellable work unchanged when adding cancellation.
>
Yeah this is more subtle.
The scope of a cancellation is a concern of the caller, not the callee (the
callee just cares that it is in scope).
Suppose I have a function handleUserRequest() that calls runCompletion()
which calls {addDocument(); codeComplete()}. Then the user closes the file.
It's the "handleUserRequest" we logically want to cancel. "runCompletion"
is the one I think should be cut across.

Corollary: I think only the entity that actually wants to cancel things
should create cancelable scopes.

In both the new proposal and the current implementation, we'll have to:
> - Allow the functions that can be cancelled to report they were cancelled
> to the callers. I guess both approaches would use the CancelledError for
> this.
> - Mark the functions that support cancellation somehow. Either with a
> comment (the new proposal) or with an explicit return type (the current
> implementation).
>
In the proposal, this is a class/codebase comment, and the is no
distinction between functions. ("ClangdServer functions/TUScheduler tasks
support cancellation on a best-effort basis").

The cancellation tasks would be allowed to be created further up the call
> chain, but that adds some "magic" to the code and makes the clients
> responsible for creating the tasks properly and writing all the boilerplate.
>
> > Examples of why this seems right:
> > - there's no reason that a cancelable span should be coupled 1:1 with an
> LSP method
> Totally agree, why do you feel the cancellable span is coupled to an LSP
> method in the current API?
>
(Sorry, I meant to say ClangdServer method, which is not quite the same)
Because ClangdServer creates the spans.

> - whether a function will *respond* to cancellation is always a *dynamic*
> property. This is a QoI issue, not a contract.
> Totally agree.
>
> > - It allows us to support cancellation of all methods without adding a
> bunch of bookkeeping to each one (or equivalent in templates)
> I don't think bookkeeping is much of a pain, most methods will just return
> TaskHandles and at the level where it's appropriate to cancel, where one
> will need to store them.
>
Sorry, here I was referring to the ClangdServer/ClangdLSPServer
implementation. The additions required to each cancelable methods are IMO
moderately intrusive and error-prone.

I guess the arguments are about the same as with other things that could
> potentially be passed through Context.
> I would prefer to use Context as little as possible and in that case I
> would prefer explicit data flow, but can see your point if you don't think
> that way.
> > - it allows embedders to support cancelling methods at the layer it
> makes sense without plumbing cancellation objects around
> Point taken, I view the fact that we can trace the data flow back to the
> cancellation sites as a benefit.
>
> - it avoids divergences in APIs based on whether we've implemented
> early-exit for this operation or not
> In the current implementation, the APIs that we believe should be
> cancellable should just always indicate they can be cancelled (i.e. create
> a new task), even if they won't do early exit in practice.
>
Not opposed to this in principle (as long as "we believe should be
cancelable" is "every async function"), but if you mean only ClangdServer
methods, why?

> - putting values in the return type that are not the result of the
> computation surprises the reader
> Providing results in a callback also makes the code slightly tricky.
>
Having
>
cancellation boilerplate does not make the client code any better either.
> (At any layer really).
> Granted that it's probably easier to just support cancelling everything
> somewhere up the call chain with the proposed implementation, but I would
> argue it's not much harder in the current approach, the only thing that's
> required is plumbing the TaskHandle up to that layer. Does not feel like a
> lot of work (but, again, we seem to disagree here).
>
>
> Overall, if you feel strongly we should prefer the new approach over the
> current implementation, happy to submit to your point of view.
> I feel it's not much work to support the current implementation in the
> callers that we have, but if you don't see the benefit in the current APIs
> and strongly feel the new ones are better, let's just do the new ones.
>
So I think we're talking mostly about the "implementation" part of point 2
in the original email.
I think this is the right thing to do, but from a practical point of view,
what I'm trying to do is instrument all APIs for cancellation.
Do you think this is should be done in a different way, or that it's
undesirable, or it's not so important?
The cancellation code that's in Clangd[LSP]Server code complete is
intrusive enough that I don't feel adding it is justified without an
important performance win.

>
> On Wed, Sep 12, 2018 at 11:59 AM Sam McCall <sammccall at google.com> wrote:
>
>> Guess I was right about which piece would be controversial.
>> (For those missing context, Cider is a non-public editor that embeds
>> ClangdServer)
>>
>> On Wed, Sep 12, 2018 at 11:28 AM Ilya Biryukov <ibiryukov at google.com>
>> wrote:
>>
>>> Hi Sam,
>>>
>>>
>>> A few things that come to mind:
>>> >  - other users of ClangdServer would set up cancellation in the same
>>> way: by creating a task handle and calling setCurrentTask() before invoking
>>> a request. Or they can not do so if they don't support cancellation
>>> (isCancelled() returns false if there's no task).
>>>
>>> My stance is that explicit APIs for cancellation are better approach.
>>>
>> I'd like to clarify what you mean by "explicit", because the
>> *cancellation* API is explicit in both cases, and the *coordination*
>> between the cancellation and the cancel-check is (to a first approximation)
>> magic in both cases.
>>
>> I think the biggest difference is whether the cancelability of an
>> operation is represented in its static type (e.g. function return type). Is
>> this what you mean?
>>
>>
>>> They clearly state the contract of the API, it's impossible to
>>> accidentally reuse the existing task handle (ClangdServer is creating a new
>>> task every time) multiple times for different tasks from the client code
>>> and they are (arguably) easier to use from Cider.
>>> The "task in the context" approach, OTOH, means users will have to read
>>> through the comments to even discover that the cancellation is there.
>>>
>> I guess I don't see that as a bad thing? Cancellation is a cross-cutting
>> concern that you mostly want to ignore, except at layers where: a) you can
>> meaningfully abort early to save some work, or b) you want to start a chunk
>> of work and maybe cancel it later.
>> Examples of why this seems right:
>>  - there's no reason that a cancelable span should be coupled 1:1 with an
>> LSP method
>>  - whether a function will *respond* to cancellation is always a
>> *dynamic* property. This is a QoI issue, not a contract.
>>
>> I may be missing the reasons on why the proposed approach is better. Any
>>> suggestions I'm missing?
>>>
>> - It allows us to support cancellation of all methods without adding a
>> bunch of bookkeeping to each one (or equivalent in templates)
>> - it allows embedders to support cancelling methods at the layer it makes
>> sense without plumbing cancellation objects around
>> - it avoids divergences in APIs based on whether we've implemented
>> early-exit for this operation or not
>> - putting values in the return type that are not the result of the
>> computation surprises the reader
>>
>> > 1) task should record the time of requested cancellation
>>> +1
>>> > 3) TUScheduler should be cancellation-aware
>>> +1
>>>
>>> > 4) We should hide the Task object.
>>> > (again, this borrows heavily from go)
>>> Go was definitely designed with cancellation (I assume of both RPCs and
>>> general computations?) in mind,
>>>
>> For what it's worth, it wasn't - Context and cancellation was very much
>> bolted on in the same way it was in clangd.
>> The only real difference is they decided not to use TLS for Context. But
>> they decided not to signal cancelability explicitly in the API for the
>> usual reasons (intermediate stack frames don't care).
>>
>>
>>> but my view on this is that in C++ it's more idiomatic to make the
>>> contracts like this explicit in the API. TaskHandle exists to make sure the
>>> callers of async APIs get an explicit object they can poke with, see the
>>> first comment about explicit vs implicit APIs.
>>>
>>> > 5) Random thought: we could support deadlines (automatic cancellation
>>> after some time), this is useful in hosted scenarios though probably not
>>> for standalone workstation use.
>>> +1, unless it's trivial to do it on Cider side. In which case, maybe we
>>> could only support this in Cider to avoid adding the code to OS version
>>> that we won't use outside our environment.
>>>
>>>
>>>
>>>
>>> On Wed, Sep 12, 2018 at 10:57 AM Sam McCall <sammccall at google.com>
>>> wrote:
>>>
>>>> Hey Ilya and Kadir,
>>>>
>>>> Was trying to understand how much we (can) win from cancellation, and
>>>> what's involved in instrumenting a LSP method for cancellation.
>>>>
>>>> Have a couple of proposals, wanted to discuss first them rather than
>>>> sending a patch because they're separate but interact.
>>>>
>>>> 1) task should record the time of requested cancellation. For analysis,
>>>> there are 3 interesting timestamps - task start, (optional) cancellation,
>>>> and end. The creator of the task currently can get start and end (via
>>>> context cleanup), but not cancellation time. This allows us to measure how
>>>> much we can improve cancellation (by bailing out "lame duck" tasks earlier).
>>>> Implementation is easy, just change the atomic<bool> to
>>>> atomic<steady_clock::rep>
>>>>
>>>> 2) we should support cancellation of any method, even if early bailout
>>>> isn't yet implemented.
>>>> Main benefit: we can then measure the lame duck time for all methods
>>>> (as above), and know where to implement early bailout. Side benefit: more
>>>> uniform/less API churn.
>>>> Implementation (I have a prototype):
>>>>  - LSP $/cancelRequest would move to JSONRPCDispatcher so it can cut
>>>> across all methods. Cleanup of TaskHandles would be handled with a context
>>>> destructor.
>>>>  - other users of ClangdServer would set up cancellation in the same
>>>> way: by creating a task handle and calling setCurrentTask() before invoking
>>>> a request. Or they can not do so if they don't support cancellation
>>>> (isCancelled() returns false if there's no task).
>>>> This is very similar to golang context cancellation.
>>>>
>>>> 3) TUScheduler should be cancellation-aware
>>>> This seems like an easy, cross-cutting win, but we should measure.
>>>>
>>>> 4) We should hide the Task object - it adds API noise and it offers too
>>>> many facilities to the wrong actors.
>>>> (maybe this is controversial, and somewhat less related).
>>>>  - Task::isCancelled() appears to be redundant, you never want to check
>>>> another task's status and checking for cancellation in a tight loop is an
>>>> antipattern that doesn't seem worth optimizing for
>>>>  - cancel() is only for the creator of a task (current API enforces
>>>> this)
>>>>  - TaskHandle/ConstTaskHandle/getCurrentTask just exist to support
>>>> exposing these details
>>>>  - most actions have obvious extensions to cases where there is no
>>>> active task, but the current API makes this awkward
>>>> So this would leave us with something like (modulo names):
>>>> bool isTaskCancelled();
>>>> using Canceler = std::function<void()>;
>>>> pair<Context, Canceler> startCancelableTask();
>>>> (again, this borrows heavily from go
>>>>
>>>> 5) Random thought: we could support deadlines (automatic cancellation
>>>> after some time), this is useful in hosted scenarios though probably not
>>>> for standalone workstation use.
>>>>
>>>> WDYT?
>>>>
>>>
>>>
>>> --
>>> Regards,
>>> Ilya Biryukov
>>>
>>
>
> --
> Regards,
> Ilya Biryukov
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/clangd-dev/attachments/20180912/4071552a/attachment-0001.html>