[lldb-dev] Locking issues

Thu Apr 18 19:17:51 PDT 2013

Hi Greg,

Thanks again for the detailed explanation of this dilly of a pickle. I
thought about this problem a little more.. It would be really nice to
resolve this soon, since the different locking behaviour on MacOSX vs
other platforms is something I think we all want to excise as quickly as
possible.

My thoughts are inline below:

>
>No, a python class gets instantiated when a process starts up and it gets
>a chance to run python code in order to make new threads lists up and
>relay them back to LLDB. The python code must use the public SB API.

Oh, that's somewhat surprising. Where is this class? I'm curious about
what it does that's done in Python.. Aren't you worried about the overhead
the Global Interpreter Lock might add to normal operations like step-over?
I have no data to back that up, but putting python stuff on the critical
path seems like it can't be good for performance. Also, not that it's
important to us, but does this also mean that building a working LLDB
without Python support is no longer an option?

>> To confirm my understanding, it seems that "regular"
>> python code (like the test cases that drive the debugger) which uses SB*
>> ends up locking the "public" locks whereas "callback" code that I
>>imagine
>> gets invoked via OSPython ends up locking the "private" locks -- is this
>> really true?
>
>Only when the private state thread calls into the python code which then
>wants to use the public API to lookup variables, evaluate expressions,
>etc in order to make the new thread list.
>
>> Are these two use-cases really running on different (main vs.
>> non-main) threads?
>
>When the private state thread is doing stuff, the process run lock is
>taken. Any public API access that requires a stopped process won't work
>when being done from the private state thread since the process is
>implementing some sort of run.

Right-O. This makes sense.

>
>> 
>> If so, once the locks are acquired, don't both of these use-cases end up
>> using the *same* shared Process data? Is there any coupling between the
>> data that the locks are protecting and the locks themselves? Based on
>>the
>> Process::GetRunLock() it seems like an arbitrary lock is chosen to
>>protect
>> all of Process' private data.
>
>No, the read/write locks don't protect the data, they just protect access
>to the process to make sure it stays stopped while clients access things,
>or make sure clients that want to read, stay locked out while process
>control is going on. See below for more detail.

OK, here's where I think things start going sideways. It might be possible
to accomplish the synchronization that's needed between the public clients
and internal callers with two R/W locks, but I'm having trouble seeing how
they are to be used in tandem. I'll attempt to explain (what I think is) a
simpler approach below.

>> 
>> If the public/public run locks *are* coupled to the public/private
>>states
>> of the process respectively, why is it that the both write locks need to
>> be kept for long periods of time? I can understand the approach of
>>having
>> an internal (private) "run" write lock being held while the process is
>> running, but I'm having trouble conceptualizing why the public run lock
>> needs to be held for similar durations. It seems any synchronization we
>>do
>> for the public state should be very short-held "critical section" type
>> locks while functions query the last-known process state, or just return
>> an error because the process is running. Any SB function that results
>>in a
>> public lock being held for longer than the duration of that function
>>seems
>> like a bug in the SB code -- does that make sense or am I way off?
>
>Since we do async process control, so when a user says "step over", this
>might result in the process being stopped and started 100 times. Before
>doing any process control, we take the public process run lock by
>acquiring the write lock. This lets any people currently in the process
>with the public run lock that was acquired for reading, to get out. Once
>the public run lock is taken, we keep it until the process run control
>action ("step over") is complete. We must lock down access to the process
>for the entire duration because it wouldn't be a good time for someone to
>come in and ask for the stack frames of a thread in between run 56 and
>57. It stack frames would be meaningless from a public perspective.
>Internally though, for any blessed sections of code (like
>OperatingSystemPython) which we know will need access to the public API
>from the private state thread, we allow them to not be locked out by
>using the private run lock.

Hmm, OK. So, in the two R/W lock approach, there's places where both locks
need to be write-acquired though, right? I'm not sure I can think of all
the other interactions between the locks though. It might be equivalent to
the thing I propose below -- can you confirm if this is true or not?

>>If we make new threads for callbacks, then we would either be locked out
>>from accessing the process, or we would need to let clients access the
>>process in between intermediate steps of the current process run
>>control, which is not what anyone wants.

Agreed that those two things are bad, but I think a third option exists:

What we need is akin to a condition variable (that knows how to relinquish
a lock during wait) so that we don't "lock ourselves out" and a shared
lock to protect the "public" interfaces that do anything with the process'
state, with some extra smartness about "long running" operations that
require nobody mess around with the process state.

Here's what I'm thinking Process needs to have in order to solve this, in
a little more concrete terms:
1. One bool (m_state_in_transition) -- true when we have a long-running
operation in progress
2. One Condition Variable bool (m_callback_in_progress) -- this remains
true when a python thread that has SB access is launched
3. One Mutex (m_lock)* -- to protect every function inside process that
reads/writes state

The rules of the game are:

- Every function in Process needs to acquire the m_lock before any
reads/writes happen. This is somewhat shitty because multiple reads would
be serialized, but more on this later*
- Any (non-const) function in Process (that modifies state like Resume(),
Step(), etc) needs to error out if m_state_in_transition == true. These
are forbidden in callbacks.
- Any const function in Process (that reads state but does not change it)
needs to acquire m_lock
- Any time an internal Process function needs to invoke some code that has
access to SB APIs, the procedure is:
-- set m_state_in_transition AND m_callback_in_progress to true
-- spawn a new thread that¹s going to use the public APIs
-- wait for m_callback_in_progress to turn false WITH m_mutex,
relinquishing the lock so others can enter Process internals, BUT
m_state_in_transition prevents any badness.
- When any "long-running" function exits, m_state_in_transition is set to
false.

Aside: As an implementation detail, the first and the last steps can
probably be combined in a single scoped object.

In concert with the above approach, any "callback" (or, 'blessed' I think
you call it) thread that has access to SB but may run during one of the
Process internals needs to set m_callback_in_progress to false when done;
thereby joining itself with the Process internal function that invoked it.

* Based on the existing design, it seems you want multiple reads to happen
concurrently (for Xcode performance reasons). To allow multiple readers to
access Process at once, upgrade the m_lock from a run-of-the-mill Mutex to
a R/W lock. The problem with this is that we need the condition variable
to play nice (i.e. unlock the R/W lock during wait().) Pthread doesn't
support this with the rw_lock type as-is, but I'm sure it can be done if
we implement ReadWriteLock the "hard" way with lower-level primitives. See
http://stackoverflow.com/questions/2699993/using-pthread-condition-variable
-with-rwlock for inspiration.

The natural benefit of the above approach is that all the
locking/unlocking magic happen inside Process, and SB has to have 0
knowledge about Process locks.

>> Is it not possible to have both "driving" code and "callback"
>> code go through exactly the same code path the SB* layer as well as the
>> Process class?
>
>You mean using the same locks? I am confused by your question. All code
>uses the same public API through the SB* layer. The SB layer locks you
>out when it is dangerous to do things while the process is running. Many
>of the issues we ran into stemmed from Xcode accessing the SB* API from
>multiple threads and all of the locks (the target lock and the process
>read/write (stopped/running) locks are needed.

Sorry I was not too clear -- basically I'm worried about the complexity of
having two discrete R/W locks here, and separate behaviour based on
internal/external threads. Since processes have one state, it "feels" like
there should be one lock.

>Let me know what you think after reading the above explanation. The locks
>are needed and necessary and we haven't been able to come up with another
>way to control access to the things we need to do from the various
>threads. We currently have two clients:
>- public clients
>- private process state thread clients
>
>public clients get locked out, as they should, when the process is
>running (continue, or stepping) so we don't get useless answers.
>
>Do you really want to evaluate the value for "x" when we are in the
>middle of a single step? No you want to get an error stating "error:
>process is running".

Right, expression evaluation would be one of those "non-const" cases that
should error out if Process m_in_transition is true.

> 
>Do you really want stack frames for all threads in a process when the
>process is in the middle of a single step? No you want to get an error
>stating "error: process is running".

Right-O.

>
>The private clients though are limited to OperatingSystemPython right
>now, but could be expanded to breakpoint actions in the near future. A
>breakpoint might say it wants to have python run when the breakpoint gets
>hit and it would be great to not have to do this on another thread just
>so we can see that "x != 12", or "rax != 123".

I'm not sure another thread for the callback will add a lot of overhead
compared to the overhead of the Python GIL. But I'm open here -- maybe
there is a way to do the condition variable thing (or equivalent) without
spawning another thread. We'd need a mutex-unlock-and-invoke-function sort
of functionality, but the general principle would be the same; when
non-Process code is running, all locks should be released, but the Process
object should be in a state whereby it cannot be changed (I.e. certain
functions are forbidden).

>Another thing we have to watch out for is that there are limitations on
>things that can be done on the private state thread. For example you
>wouldn't want to try to evaluate an expression (run the target) while you
>are currently running the target for another expression. There is special
>code for this for one special case (calling "mmap" to allocate memory by
>manually calling a function without using the expression parser), but
>those don't use any of the special locking we have implemented.

Since evaluating an expression can modify the process (for example
'expression -- i++') it should be treated as non-const unless some IR
interpreter can guarantee otherwise (in a future optimization utopia.) It
seems safe to turn the process m_in_transition to be true during
evaluation of any expression, thereby preventing anything else from
modifying or reading from Process. Of course, internally, the
expression-evaluation would need to have it's own UncheckedResume()
functionality that resumes the inferior without checking m_in_transition
like the API Resume() would.

>> Another part that complicates reviewing this code is the external
>>locking
>> -- wouldn't it be way better to encapsulate all the locking/unlocking
>> inside the Process class? I like code that keeps the lock code close to
>> the data they protect; what's the benefit of having SB* API classes
>> acquire and maintain internal Process locks?
>
>You might need to do multiple things. Like to get a backtrace for all
>threads. Do you want to ask the process for thread at index 0, then
>backtrace it. Then another thread resumes the process and it stops
>somewhere else. Now you ask for thread at index 1 and you get a different
>thread because the thread list has changed. Not a great experience. This
>is why we externally lock.

Agreed; it would be a terrible bug.

>>I'd prefer to see this whole Process locking thing redesigned
>> with internal rather than external locking. I think this will help nail
>> down exactly what cases are causing the double-locking thing to
>>deadlock.
>
>This works great for single access functions, but falls down for complex
>actions (like backtracking all threads).

I don't think external locking is the only way to accomplish this. What's
wrong with a Process::BacktraceAllThreads() (or even
Process::BacktraceSomeThreads(ThreadIDs)) function that internally
acquires the run lock and does its business without fear of somebody
continuing the process in the meantime? In BacktraceAllThreads(), if the
process is in transition, that function should just fail. If the process
is stopped, any other non-const calls (to Resume() or
expression-evaluation) should block until BacktraceAllThreads() returns,
or error out. Glancing at the code that calls GetRunLock() in the SB
layer, it seems like it all belongs in the Process class. I understand the
hesitation to add more stuff to Process, since the class is already
getting kinda large, but I think we should worry about splitting it up
into more more bite-sized components after this deadlock business is
sorted out.

>We are open to any ideas you have for robust locking. We have tossed
>around a few ideas for the process control, but haven't come up with one
>that works better than what we have. Having a multi-threaded API sure
>does complicate things and is something other debuggers don't have to
>worry about.

Definitely does complicate things just a smidge ;) I suppose it's too late
to require that Xcode uses lldb in a more disciplined single-threaded
mode? ;) j/k

Cheers,
Dan