[Lldb-commits] [PATCH] D83446: [WIP][lldb/Reproducers] Synchronize the command interpreter with asynchronous events

Pavel Labath via Phabricator via lldb-commits lldb-commits at lists.llvm.org
Thu Jul 9 13:11:21 PDT 2020


labath added a comment.

In D83446#2142030 <https://reviews.llvm.org/D83446#2142030>, @JDevlieghere wrote:

> In D83446#2141713 <https://reviews.llvm.org/D83446#2141713>, @labath wrote:
>
> > I'm wondering if this problem does not go beyond reproducers... The fact that we can have two threads running in parallel, doing stuff, without much synchronization seems like a bad idea in general. In theory, something similar could happen during normal operation as well, if the user was fast enough to type "cont<Return>" before the stop reason is printed. In the non-reproducer mode we wouldn't crash with the unexpected packet assertion, but we would probably do something unexpected nonetheless. Of course, a real human user won't be fast enough to do that, but I wouldn't rule out this being the cause of the flakiness of some of our pexpect tests. And I'm not sure what happens when sourcing command files, which is fairly similar to running a reproducer. How would the events be processed there?
>
>
> Agreed. I believe at some point (long before I started working on LLDB) all the GDB communication was limited to a single thread, but the overhead of having to talk and synchronize with that thread was too costly and it got relaxed.


I don't believe that is relevant here. We still have a gdb lock, which ensures that at most one thread is communicating with the server at any given time, even though it's not always the same thread. But that lock is far too granular -- it works at the packet level. What we need here is a way to ensure that process of fetching the stop reason (as a whole) does not interleave with the subsequent command.

> 
> 
>> So, I'm wondering if we shouldn't add some form of synchronization here, which would be independent of the reproducer mode we're running in. That would solve the reproducer problem as a consequence, but it seems like it would be generally useful. One way to do that could be to send some sort of a ping event (EventDataReceipt ?) to the event handler thread before or after running a command, and wait for it to report back. That should ensure that all events produced by the previous command have been fully processed...
> 
> How would this be different from synchronous mode and how would you make this work in asynchronous mode?

The way I see it, this is mostly orthogonal to the (a)synchronous mode issue. I consider the problem to be in the `gdb-remote` command, which behaves the same way in both modes. It causes a eBroadcastBitStateChanged/eStateStopped to be sent, and in response to that the default event handler prints the stop reason string. Normally this happens very quickly. So quickly, that one would be forgiven to consider the stop reason string to be the "output" of the gdb-remote command. My proposal is to not consider this command to be "complete" until that event is processed, essentially making that string _be_ the output of the command.

For the asynchronous mode, I do think that we will need some form of a global "clock". I'm just not sure that this is the right form for it. To me it seems that this is mostly about the relationship of the command interpreter and the gdb-remote protocol -- if I do an asynchronous "process continue", followed (10 seconds later) by an "process interrupt", the gdb replayer needs to know that it should not respond to the $c packet, but wait for the ^C interrupt packet which will come from the interrupt command.

Maybe you could try explaining how you wanted to make this work for asyncrhonous mode. (Btw, I think the example above may not even need a global clock -- the gdb replayer can just note that the $c packet should not be replied to, and then send the stop-reply as a response to the ^C packet.)

> As an aside: the first thing I considered here was making this particular command //synchronous// by blocking until we get the stop event.

Now, here I get a feeling that one of us is misunderstanding the (a)synchronous concept. The way I remember this is that when we want to do a synchronous command, we wait for the Stopped event by injecting a custom listener between the source, and the final listener. This custom listener fetches the event, does something, and then broadcasts it to the final listener. This ensures that the process is stopped and that most of the work that should happen when it stops has happened, but it does _not_ synchronize with the processing done by the final listener (which is what we want here).

> During replay the execution is always synchronous (which is something I don't like and was hoping to get rid off with this patch). That would only solve the problem for this command and adds a bunch of assumptions about connections resulting in stops. I know this is always true for gdb platforms, but maybe not for other platforms.

Actually, I wanted this to be fully general -- after the execution of _any_ command, we make sure to drain _all_ events from the default event handler queue. This handling can still be done in the Debugger object, all the command interpreter would need to do is call `debugger->DrainEventHandler()` at an appropriate time.

> Also currently the debugger is the only one that deals with these events, I'm not super excited to have that logic spread out with some commands handling it synchronously and others doing it asynchronously.



> Based on your description I don't yet see how the `EventDataReceipt` thingy would work. If we block from the moment the event is posted, then what's the goal of having the event handling logic running in a separate thread? If we block between when the event is being handled until it finished, then how would you avoid a race between when the event is broadcast and when it's being handled? Can you elaborate a bit on this?

For this particular case, I don't think there's an advantage (and there certainly are disadvantages) to doing this in a separate thread. IIUC (I also wasn't present at that time), this event handler thread exists so that we can respond to changes done through the SB API. So, if the something does the SB equivalent of "gdb-remote" or "frame select" or whatever, we still can print a message to the console that the state of the process has changed. Reusing this mechanism for the command line kinda makes sense, since we want things to also work the other way around -- have the IDE respond to process changes which are being done through the command line.

The way I was imagining this working is that the `DrainEventHandler` function would send a special event to the default event handler thread, and wait until it gets processed. Since the events are processed in FIFO order, this would ensure that all preceding events have been processed as well. Note that this would go to the default event handler even if the process events happen to be going to a different listener (e.g. one provided by the IDE). In that case the default event handler would just return straight away -- which is fine, because in this case the stop reason would not get printed.

That said, the talk of SB API has reminded me that a similar race can exist between the SB commands and the event handler thread. Given that the SB replayer also streams SB commands as fast as it can, the same race can occur if the user does the SB equivalents of the above command line commands (this is a problem with both solutions).

Now obviously, we don't want to drain the event queue after every SB call. I'm not exactly sure what that means though...


Repository:
  rLLDB LLDB

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83446/new/

https://reviews.llvm.org/D83446





More information about the lldb-commits mailing list