[Lldb-commits] [PATCH] D86652: [lldb] Fix ProcessEventData::DoOnRemoval to support multiple listeners

Fri Aug 28 09:55:19 PDT 2020

jingham added a comment.

In D86652#2244043 <https://reviews.llvm.org/D86652#2244043>, @labath wrote:

> This is going to derail this patch somewhat (sorry), but since this is an important long-standing open issue, I think we should discuss it in more detail.
>
> The part that's bothering me about this for quite some time is... Why do we even have this DoOnRemoval stuff in the first place?
>
> I mean instead of doing this work in `DoOnRemoval`, why don't we do this work /before/ enqueueing the relevant event. Then the event would be of purely informative character and it would not matter if its received by one, two, ten, or zero listeners. Of course, if a user wanted to program lldb in a correct and race-free manner, it would /have to/ listen to these events (as otherwise it would not know when is it safe to run some actions), but that would be something that's entirely up to him.
>
> Back when I was debugging some races in tests with asynchronous listeners, I remember it was pretty hard to reason about things and prove they are correct, since the dequeueing of the event (and the resulting flurry of actions within lldb) could come at any moment. If those actions were done internally to lldb, they would be more sequenced more deterministically and be easier to reason about than if they are executed whenever the user chooses to dequeue an event. (I remember things got particularly messy when a test finished before dequeuing an event, at which point the processing of the event was executing concurrently with the `process.Kill()` statement we execute at the end of each test.)

It would certainly be worth rethinking this, these were decisions made very early in the development of lldb...

I did it the way I did for several reasons:

First of all was to get the work that was done by the DoOnRemoval - which is basically anything lldb can do since it's where the breakpoint callbacks are triggered - off the private state thread.  The private state thread needs to be alive and getting private events and running the thread plans to do the work required by breakpoint callbacks, and while I do have a system to spin up another private state threads if you need to use one while doing some piece of work on the private state thread, that seemed fragile to me and I didn't want to lean on it too heavily.  It is important to make the transition from lldb doing internal work with the incoming event stream to giving over control to the user.  Internally, we have to let ourselves do things based off the private state, but external users are gated by the public state instead,  Having it happen when the event gets handed to the user was a convenient place to put this barrier.

The other reason I did it this way was that I didn't want the process state and the event stream to get out of sync.  I we do all the work on an event including setting the public state to stopped before posting the event, then you will easily get into the situation where the public state said "stopped" when you haven't yet fetched the stopped event off of the event queue.  This inconsistencies seemed likely to result in confusing errors in use of the API's.

If we can come up with another way to achieve these goals but do the user-facing work of processing the event somewhere else, I don't have any objections.

> In D86652#2242770 <https://reviews.llvm.org/D86652#2242770>, @ilya-nozhkin wrote:
>
>> One of the possible solutions is to send a specifically marked event to a primary listener first. Then, make this event to invoke a handshake in its destructor (which would mean that primary listener doesn't need the event anymore). And handshake can be implemented as just a rebroadcasting a new instance of the event to other listeners (or via unlocking some additional mutex / notifying some conditional variable). However, destructor may be invoked not immediately after event processing is finished. For example, if a primary listener's thread defines an EventSP before the listening cycle and doesn't reset it until GetEvent returns a new one (which may not happen until the next stop, for example).
>
> This primary listener technique could be implemented purely in your own code, could it not? You could have one listener, which listens to lldb events, and then rebroadcasts it (using arbitrary techniques, not necessarily SBListener) to any number of interested threads that want to act on it. That might make it easier to ensure there are no races in your code, and would avoid stepping on some of the internal lldb races that I am sure we have... Otherwise I (as Jim) expect that you will still need to have some sort of communication between the various listeners listening to process events, as they'd need to coordinate
>
>> Anyway, the concept of primary and secondary listeners requires to forbid secondary listeners to do "step" or "continue" at all. So, why can't we do the same without any additional code? I mean, I think that the most common case of multiple listeners is a Python extension adding a listener to read/write some memory/registers on each stop taking the RunLock.
>
> Yes, you could take the run lock, which would guarantee that the other listeners don't get to resume the process. But how do you guarantee that you _can_ take the run lock  -- some other thread (the primary listener) could beat you to it resume the process, so you'd miss the chance to read/write those registers. Seems very fragile. A stop hook does look like a better way to achieve these things.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86652/new/

https://reviews.llvm.org/D86652