[lldb-dev] Reporting the results of "thread select"
Jim Ingham via lldb-dev
lldb-dev at lists.llvm.org
Wed Dec 18 15:38:02 PST 2019
We were seeing an ASAN failure in a simple Lit test somebody here was writing. The failure was that the "thread backtrace" command was accessing freed memory dereferencing a VariableSP it got from the VariableList of a Block. Turns out that the VariableList class has no locking, and backtrace was getting a variable from the block's list while another thread was in the process of filling in the VariableList of that block from DWARF.
So that bug is easy, there's nothing saying that you can't ask a question of a block on two different threads, nor any way to gate setting up the VariableList before another thread accesses it that I can see. It seems like the VariableList should lock when it adds or retrieves elements.
But I was curious how we managed to be filling and accessing the VariableList simultaneously. The answer struck me as odd:
The command before "thread backtrace" in the Lit test was "thread select 1". It doesn't make sense that this would cause any access problems, since "thread select 1" just selects a thread, and prints the Status message from the newly selected thread. That latter bit will print the arguments, so it will access the VariableList. But it should have been done accessing that by the time the "thread backtrace" command was run. Or so it seems...
What CommandObjectThreadSelect in fact does is call "SetSelectedThreadByIndex" with "notify" set to "true" - which raises an eBroadcastBitThreadSelected event. Then the command returns success but no result. Then the Debugger's event handler gets the eBroadcastBitThreadSelected in its DefaultEventHandler, which calls Thread::GetStatus, putting the results into the debugger's Async output stream. And that's how what looks like the output from "thread select" actually gets printed.
This was causing the race because the event handling is not synchronized with command handling. I'm not sure whether it would be a good idea to handle commands by having the Debugger post a "got a new line" event to itself, and have the handling of that event trigger the command execution. But that's not how it works today. IOHandlerEditLine directly dispatches a command when it gets a complete line. So the output from the "thread select" status printing could happen before, during or after the next command is processed.
That's actually a little problematic if you are writing Lit type tests that scrape lldb output because the thread selection message isn't guaranteed to happen before the next command is fetched. So you could occasionally get the output from the command after a "thread select" mixed in with the thread status message.
That's also not how we handle the "frame select" command which should be equivalent. In that case, we still send an event (we always have to do that so UI's can keep in sync with the command line) but the Debugger doesn't listen for that event, instead we put the status result directly into the command result object.
I tried to figure out why it was done this way but there anything in the scm history that shed any light on this.
This seems like a weird way to do things, and leaves a little land mine for people because something that looks very much like the output of the "thread select" command is in fact NOT it's output. So I'd like to make "thread select" work like "frame select". But for that to work, I would have to remove the code in Debugger::HandleThreadEvent that handles the eBroadcastBitThreadSelected event. If something else was depending on this to print status, that would get broken. Note, the SB API's don't send events when they change the Stack or the Frame. These are supposed to be the API's for driving lldb, and having to say "Thread.Select" then wait for an event saying the thread was selected seems like an obnoxious API... So we don't need to worry about that. And the testsuite was clean.
I was just wondering if anybody sees some reason I'm missing for why it would be done this way?
If not, I'll just change it, and we'll see if anything I don't know about breaks...
More information about the lldb-dev