[Lldb-commits] [lldb] [lldb] Fix race condition in Process::WaitForProcessToStop() (PR #144919)

Wed Jul 30 09:54:08 PDT 2025

athierry-oct wrote:

Hi, sorry for the delay. I investigated the `TestCallThatRestarts.py` failure, and I think I’ve figured out what’s going on.

Toward the end of the test, we run:

```python
value = frame.EvaluateExpression("call_me (%d)" % (num_sigchld), options)
```

Then we call:

```python
error = process.Continue()
```

This triggers `Process::ResumeSynchronous()`, which:
- Hijacks the process events
- Calls `PrivateResume()`
- Waits for a stop event via `WaitForProcessToStop()`

The issue is that a public stop event is sent at the end of `EvaluateExpression()`, and that event is still sitting in the primary listener’s queue when `Process::ResumeSynchronous()` hijacks the events. With my changes, that old stop event gets moved to the hijacker’s queue. So `ResumeSynchronous()` ends up grabbing it (even though it happened *before* the resume) and returns too early.

It looks like moving pending events during hijacking might not always be the right thing to do. In the case of `ResumeSynchronous()`, I think we want to make sure the stop event we wait for happens *after* hijacking and resuming.

One idea: we could add a `bool move_pending_events` flag to `HijackProcessEvents()` and `RestoreProcessEvents()`. It would default to `false`, and we could set it to `true` in `StopForDestroyOrDetach()`. This way, only the behavior of `StopForDestroyOrDetach()` is modified for now.

I gave this a quick try, and now the `check-lldb-api` suite passes on my machine.

Does that approach sound reasonable to you?

https://github.com/llvm/llvm-project/pull/144919