[Lldb-commits] [lldb] [LLDB] Ptrace seize dead processes on Linux (PR #137041)

Wed May 21 08:16:05 PDT 2025

https://github.com/labath commented:

[I'm sorry, this is going to be long]

Most of David's comments are there because you copied the existing attach implementation. Even if we go down the path of different attach methods, I don't think it's necessary to copy all of that. Actual attaching is a small part of that function. The rest of the stuff, which deals with looping, gathering thread ids, and error messages, could be handled by common code.

I say *if* because it's not clear to me that we've given up on using the common code path we discussed on the rfc thread. I think it could look something like this:
```
  ptrace(PTRACE_SEIZE, pid, 0, (void*)(PTRACE_O_TRACEEXIT));
  syscall(SYS_tgkill, pid, pid, SIGSTOP);
  ptrace(PTRACE_INTERRUPT, pid, 0, 0);
  int status;
  waitpid(pid, &status, __WALL);
  if (status>>8 == (SIGTRAP | PTRACE_EVENT_STOP << 8)) {
    ptrace(PTRACE_CONT, pid, 0, 0);
    waitpid(pid, &status, __WALL);
  }
```

If you run this on a "regular" process you get this sequence of events:

```
ptrace(PTRACE_SEIZE, 31169, NULL, PTRACE_O_TRACEEXIT) = 0
tgkill(31169, 31169, SIGSTOP)           = 0
ptrace(PTRACE_INTERRUPT, 31169)         = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_TRAPPED, si_pid=31169, si_uid=1000, si_status=SIGSTOP, si_utime=0, si_stime=0} ---
wait4(31169, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSTOP}], __WALL, NULL) = 31169
```
.. or this one:
```
ptrace(PTRACE_SEIZE, 31169, NULL, PTRACE_O_TRACEEXIT) = 0
tgkill(31169, 31169, SIGSTOP)           = 0
ptrace(PTRACE_INTERRUPT, 31169)         = 0
wait4(31169, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGTRAP}|PTRACE_EVENT_STOP<<16], __WALL, NULL) = 31169
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_STOPPED, si_pid=31169, si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
ptrace(PTRACE_CONT, 31169, NULL, 0)     = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_TRAPPED, si_pid=31169, si_uid=1000, si_status=SIGSTOP, si_utime=0, si_stime=0} ---
wait4(31169, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSTOP}], __WALL, NULL) = 31169
```

The difference is in whether the kernel is fast enough to notice the SIGSTOP before it PTRACE_INTERRUPTS the process (the fact that this results in nondeterministic behavior might be worth filing a bug report for the kernel). With this simple code, I usually get the second version, but this is really a matter of microseconds. Even inserting `usleep(0)` between the SIGSTOP and PTRACE_INTERRUPT calls is enough to make the process immediately stop with SIGSTOP.

However, that doesn't matter. After this code is done, the process is stopped with a SIGSTOP, just like it was in the PTRACE_ATTACH case. For a "dead" process, the sequence slightly different:
```
ptrace(PTRACE_SEIZE, 31169, NULL, PTRACE_O_TRACEEXIT) = 0
tgkill(31169, 31169, SIGSTOP)           = 0
ptrace(PTRACE_INTERRUPT, 31169)         = 0
wait4(31169, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGTRAP}|PTRACE_EVENT_EXIT<<16], __WALL, NULL) = 31169
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_TRAPPED, si_pid=31169, si_uid=1000, si_status=SIGTRAP, si_utime=0, si_stime=0} ---
```
In this case you always get a PTRACE_EVENT_EXIT stop.

The nicest part about this is that could be used to distinguish between the two process states -- without the core dumping flag, the proc filesystem, or anything else. I think it's theoretically possible to catch a live thread while it's exiting and get a PTRACE_EVENT_EXIT, but getting that for *all* threads in the process is very unlikely.

Or, instead of making this a process wide-property, we could make this a property of a specific thread. We could say that a *thread* that is in an "exiting" state is not allowed to run expressions. And this dead process would just be a special case of a process which happens to have all threads in the exiting state.

And the nicest part about *that* is that this is extremely close to being testable. We no longer need a core-dumping process -- we just need a process that's stopped in a PTRACE_EVENT_EXIT. Right now, we don't have a way to do that, but we're really close to that. One way to do that would be to let the user request stopping a thread at exit. If we implement all of the rest, this will be mainly a UX question of how we want to expose this to the user. I think we could make some sort of an experimental setting for that, or even just some secret packet that we send with the `process plugin packet` command.

What do you say to all this? I realize I am sort of asking you my pet feature, but I think that by framing your feature (debugging of dead processes) as a special case of this (debugging exiting processes), we can avoid most of the problematic issues with this PR (divergence of implementation, impossibility of testing).

https://github.com/llvm/llvm-project/pull/137041