[Lldb-commits] [PATCH] D102085: Add an "interrupt timeout" to Process, fix a bug in handling interrupt timeouts in
Jim Ingham via Phabricator via lldb-commits
lldb-commits at lists.llvm.org
Fri May 7 11:26:52 PDT 2021
jingham created this revision.
jingham added reviewers: clayborg, labath, JDevlieghere.
jingham requested review of this revision.
Herald added a project: LLDB.
Herald added a subscriber: lldb-commits.
This patch started as an attempt to rationalize the handling of the timeout for interrupting a running process. Process::Halt sends a 10 second timeout to WaitForProcessToStop, but that timeout didn't have any effect on how the interrupt was actually handled in ProcessGDBRemote & its GDBRemoteClient client, and there was no way to plumb the desired timeout to that layer.
This patch adds an interrupt-timeout setting to process and plumbs that through.
That then allowed me to write some more tests on the interrupt handling, in the course of which I found a bug in the handling of the interrupt timeout.
The lowest level waiting on the remote stub doesn't block when the process is running. Instead, it passes in a 5 second wakeup timeout when it calls ReadPacket. That's there so we can detect if the connection has dropped, but in the current code it is also the de facto interrupt timeout as well.
When ProcessGDBRemote wants to interrupt a running process, it sends a break to debugserver, and sets some ivars that say "interrupt is in flight" and to provide the "interrupt timepoint" to the client.
When the GDBRemoteClientBase object wakes up from ReadPacket because of the timeout, it checks whether there was an interrupt in flight, and if so, next checks whether 5 seconds have elapsed since the interrupt request. If that latter test is NOT true, it is supposed to go back into ReadPacket to wait out the remaining time (*). But it flubs this part. Instead of calling "continue" to go back to top of the ReadPacket loop, it just breaks from the "result checking" switch. The next thing after the switch is a check to see if the response packet is empty, and if it is empty, we exit from the ReadPacket loop, returning eStateInvalid - the same state as if the interrupt had failed.
But if we woke up before the interrupt had a chance to take effect, there won't be any response yet, and so we error out. That looks counter to the intention of the code that was checking the timeouts. It's bad because it means that if you are unlucky and happened to request the interrupt just before the ReadPacket call exceeds the timeout, you could in fact end up waiting only a very short time for the interrupt.
I have lots of reports of this happening, but never reproducibly. The reports always come in contexts where lots of processes were being started and stopped quickly. For instance, Xcode runs all its test plans in the debugger when not running in a test harness, and runs the tests in parallel. So it was starting & stopping a lot of processes in parallel, and that this should occasionally cause the system to stall didn't seem altogether unreasonable. That was the motivation for adding the interrupt timeout in the first place. That still may be true, but it looks like this bug was making the chance of exceeding the timeout more likely.
I still think having the timeout is valuable, however, if for no other reason than that was how I was able to write a test for the behavior that triggered this bug...
(*) BTW, this is not an exact timer. We don't have a way to stop ReadPacket & readjust the wakeup time if the interrupt time happens to be shorter. And when ReadPacket wakes up and finds we have an interrupt pending, but haven't waited long enough, formally we should change our timeout to be the amount left to wait for the interrupt. I didn't change this behavior as I don't think we need to guarantee that kind of exactness. As long as we wait at least as long as the timeout requested, I think we're doing what folks want, and I don't think being more precise is worth the added complexity.
Repository:
rG LLVM Github Monorepo
https://reviews.llvm.org/D102085
Files:
lldb/include/lldb/Target/Process.h
lldb/source/Plugins/Platform/gdb-server/PlatformRemoteGDBServer.cpp
lldb/source/Plugins/Process/gdb-remote/GDBRemoteClientBase.cpp
lldb/source/Plugins/Process/gdb-remote/GDBRemoteClientBase.h
lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationClient.cpp
lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationClient.h
lldb/source/Plugins/Process/gdb-remote/GDBRemoteRegisterContext.cpp
lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
lldb/source/Target/Process.cpp
lldb/source/Target/TargetProperties.td
lldb/test/API/functionalities/gdb_remote_client/TestHaltFails.py
lldb/unittests/Process/gdb-remote/GDBRemoteClientBaseTest.cpp
lldb/unittests/Process/gdb-remote/GDBRemoteCommunicationClientTest.cpp
lldb/unittests/tools/lldb-server/tests/TestClient.cpp
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D102085.343721.patch
Type: text/x-patch
Size: 81088 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/lldb-commits/attachments/20210507/616b4c5e/attachment-0001.bin>
More information about the lldb-commits
mailing list