[lldb-dev] eStateInvalid handling in ProcessGDBRemote::AsyncThread...

Wed Apr 28 19:20:57 PDT 2021

I have a question about the handling of SendContinuePacketAndWaitForResponse when an attempt to interrupt the process times out.

One oddity I noticed is in the handling of the eStateInvalid return from SendContinuePacketAndWaitForResponse (in ProcessGDBRemote::AsyncThread).  When there's an async interrupt request in flight and the ReadPacket in SBPAWFR wakes up from its periodic timeout and it's been "too long" with nothing from debugserver, SBPAWFR returns eStateInvalid.  The code handling that in the AsyncThread does:

              case eStateInvalid: {
                // Check to see if we were trying to attach and if we got back
                // the "E87" error code from debugserver -- this indicates that
                // the process is not debuggable.  Return a slightly more
                // helpful error message about why the attach failed.
                if (::strstr(continue_cstr, "vAttach") != nullptr &&
                    response.GetError() == 0x87) {
                  process->SetExitStatus(-1, "cannot attach to process due to "
                                             "System Integrity Protection");
                } else if (::strstr(continue_cstr, "vAttach") != nullptr &&
                           response.GetStatus().Fail()) {
                  process->SetExitStatus(-1, response.GetStatus().AsCString());
                } else {
                  process->SetExitStatus(-1, "lost connection");
                }
                break;
              }

So when an interrupt fails, we immediately give up on the process, setting the exit status to -1.  

Note, however, that this branch of the if doesn't set "done" to true.  The switch is inside a 'while (done)', so after setting the state to eStateInvalid, we go back to wait for another packet from the debug stub.  

SetExitStatus calls SetPrivateState(eStateExited), which will produce another event, and normally we'll handle that and shut down nicely.  But by returning to fetch another packet from debugserver, we're leaving ourselves open to getting a delayed stop we are no longer set up to deal with.

The background is that there's a longstanding but infrequent crash in lldb, which  I'm trying to fix.  I've never been able to reproduce it locally, so I only have the end state.  

The crashing thread is the internal-state thread.  It is crashing because it was handling a stop event, and went to ask the thread what to do, and the ThreadPlanStack is corrupt in some form or other.  The symptoms vary, but they all violate basic invariants in the ThreadPlanStack (like it will crash because it has no elements, which is never true).  At the time of the crash, there's always another thread which is some ways past the "wait for the process to stop" part of the Destroy operation on the process, and has begun tearing down Process objects.  That's pretty clearly why the ThreadPlanStack is in a bad state...

So far as I can see, the only way that could happen is if the attempt to interrupt timed out, but we didn't stop listening for packets, and didn't immediately shut down the internal state thread.  So then debugserver woke up, and sent the stop packet, and the internal state thread tried to process it, but we were already in mid Destroy, and that went poorly...

The fact that we go back for another packet after getting eStateInvalid seems to be what allows this to happen.

lldb has lacked the "done = true;" that I would have expected here since at least the great reformatting, so I don't have any history on this?  

I added the "done = true;" and ran the testsuite, and there weren't any new failures.  But this code is pretty complex, so there might be some reason for waiting to see if the stub returns another packet after the interrupt fails.  But I can't think of one.

Can anybody else think of a reason why not to set done to true here?

Jim