[Lldb-commits] [PATCH] D83446: [WIP][lldb/Reproducers] Synchronize the command interpreter with asynchronous events

Thu Jul 9 13:33:29 PDT 2020

JDevlieghere added a comment.

In D83446#2142473 <https://reviews.llvm.org/D83446#2142473>, @labath wrote:

> In D83446#2142030 <https://reviews.llvm.org/D83446#2142030>, @JDevlieghere wrote:
>
> > Agreed. I believe at some point (long before I started working on LLDB) all the GDB communication was limited to a single thread, but the overhead of having to talk and synchronize with that thread was too costly and it got relaxed.
>
>
> I don't believe that is relevant here. We still have a gdb lock, which ensures that at most one thread is communicating with the server at any given time, even though it's not always the same thread. But that lock is far too granular -- it works at the packet level. What we need here is a way to ensure that process of fetching the stop reason (as a whole) does not interleave with the subsequent command.

It is relevant for the reproducer and the unexpected packet. If there was one thread dealing with these events, we could do the synchronization there. The problem is that at the server side it's already too late. Anyway, this is not something that I want to pursue :-)

>> How would this be different from synchronous mode and how would you make this work in asynchronous mode?
> 
> The way I see it, this is mostly orthogonal to the (a)synchronous mode issue. I consider the problem to be in the `gdb-remote` command, which behaves the same way in both modes. It causes a eBroadcastBitStateChanged/eStateStopped to be sent, and in response to that the default event handler prints the stop reason string. Normally this happens very quickly. So quickly, that one would be forgiven to consider the stop reason string to be the "output" of the gdb-remote command. My proposal is to not consider this command to be "complete" until that event is processed, essentially making that string _be_ the output of the command.

That's the approach I described below. Blocking on the stop event seems at odds with the asynchronous mode. Does a connect result in a stop for every platform we support?

> For the asynchronous mode, I do think that we will need some form of a global "clock". I'm just not sure that this is the right form for it. To me it seems that this is mostly about the relationship of the command interpreter and the gdb-remote protocol -- if I do an asynchronous "process continue", followed (10 seconds later) by an "process interrupt", the gdb replayer needs to know that it should not respond to the $c packet, but wait for the ^C interrupt packet which will come from the interrupt command.
> 
> Maybe you could try explaining how you wanted to make this work for asyncrhonous mode.

Although I want replay to behave a close to capture as possible, I'm fine with continuing to replay in synchronous mode.

> (Btw, I think the example above may not even need a global clock -- the gdb replayer can just note that the $c packet should not be replied to, and then send the stop-reply as a response to the ^C packet.)

That's already how it works today. For the reproducers the (a)synchronous mode is all about the command interpreter and when it issues the next command.

>> As an aside: the first thing I considered here was making this particular command //synchronous// by blocking until we get the stop event.
> 
> Now, here I get a feeling that one of us is misunderstanding the (a)synchronous concept. The way I remember this is that when we want to do a synchronous command, we wait for the Stopped event by injecting a custom listener between the source, and the final listener. This custom listener fetches the event, does something, and then broadcasts it to the final listener. This ensures that the process is stopped and that most of the work that should happen when it stops has happened, but it does _not_ synchronize with the processing done by the final listener (which is what we want here).
> 
>> During replay the execution is always synchronous (which is something I don't like and was hoping to get rid off with this patch). That would only solve the problem for this command and adds a bunch of assumptions about connections resulting in stops. I know this is always true for gdb platforms, but maybe not for other platforms.
> 
> Actually, I wanted this to be fully general -- after the execution of _any_ command, we make sure to drain _all_ events from the default event handler queue. This handling can still be done in the Debugger object, all the command interpreter would need to do is call `debugger->DrainEventHandler()` at an appropriate time.
> 
>> Also currently the debugger is the only one that deals with these events, I'm not super excited to have that logic spread out with some commands handling it synchronously and others doing it asynchronously.
> 
> 
> 
>> Based on your description I don't yet see how the `EventDataReceipt` thingy would work. If we block from the moment the event is posted, then what's the goal of having the event handling logic running in a separate thread? If we block between when the event is being handled until it finished, then how would you avoid a race between when the event is broadcast and when it's being handled? Can you elaborate a bit on this?
> 
> For this particular case, I don't think there's an advantage (and there certainly are disadvantages) to doing this in a separate thread. IIUC (I also wasn't present at that time), this event handler thread exists so that we can respond to changes done through the SB API. So, if the something does the SB equivalent of "gdb-remote" or "frame select" or whatever, we still can print a message to the console that the state of the process has changed. Reusing this mechanism for the command line kinda makes sense, since we want things to also work the other way around -- have the IDE respond to process changes which are being done through the command line.
> 
> The way I was imagining this working is that the `DrainEventHandler` function would send a special event to the default event handler thread, and wait until it gets processed. Since the events are processed in FIFO order, this would ensure that all preceding events have been processed as well. Note that this would go to the default event handler even if the process events happen to be going to a different listener (e.g. one provided by the IDE). In that case the default event handler would just return straight away -- which is fine, because in this case the stop reason would not get printed.
> 
> That said, the talk of SB API has reminded me that a similar race can exist between the SB commands and the event handler thread. Given that the SB replayer also streams SB commands as fast as it can, the same race can occur if the user does the SB equivalents of the above command line commands (this is a problem with both solutions).

I actually designed this solution with that in mind. I imagine having a synchronizer for every command interpreter and a different one for the API replay. Why would the current patch not work for that?

> Now obviously, we don't want to drain the event queue after every SB call. I'm not exactly sure what that means though...

Repository:
  rLLDB LLDB

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83446/new/

https://reviews.llvm.org/D83446