[lldb-dev] Locking issues

Thu Apr 18 11:47:57 PDT 2013

On Apr 18, 2013, at 11:05 AM, "Malea, Daniel" <daniel.malea at intel.com> wrote:

> Hi Greg, thanks for the explanation.
> 
> One part I am not sure of is where OperatingSystemPython uses the SB
> APIs.. You're referring to user-code, right?

No, a python class gets instantiated when a process starts up and it gets a chance to run python code in order to make new threads lists up and relay them back to LLDB. The python code must use the public SB API.

> I am not seeing anything in
> the OperatingSystemPython.[cpp|h] (even the update thread list function)
> that uses SB* stuff.

Extactly, see above.

> To confirm my understanding, it seems that "regular"
> python code (like the test cases that drive the debugger) which uses SB*
> ends up locking the "public" locks whereas "callback" code that I imagine
> gets invoked via OSPython ends up locking the "private" locks -- is this
> really true?

Only when the private state thread calls into the python code which then wants to use the public API to lookup variables, evaluate expressions, etc in order to make the new thread list.

> Are these two use-cases really running on different (main vs.
> non-main) threads?

When the private state thread is doing stuff, the process run lock is taken. Any public API access that requires a stopped process won't work when being done from the private state thread since the process is implementing some sort of run.

> 
> If so, once the locks are acquired, don't both of these use-cases end up
> using the *same* shared Process data? Is there any coupling between the
> data that the locks are protecting and the locks themselves? Based on the
> Process::GetRunLock() it seems like an arbitrary lock is chosen to protect
> all of Process' private data.

No, the read/write locks don't protect the data, they just protect access to the process to make sure it stays stopped while clients access things, or make sure clients that want to read, stay locked out while process control is going on. See below for more detail.

> 
> If the public/public run locks *are* coupled to the public/private states
> of the process respectively, why is it that the both write locks need to
> be kept for long periods of time? I can understand the approach of having
> an internal (private) "run" write lock being held while the process is
> running, but I'm having trouble conceptualizing why the public run lock
> needs to be held for similar durations. It seems any synchronization we do
> for the public state should be very short-held "critical section" type
> locks while functions query the last-known process state, or just return
> an error because the process is running. Any SB function that results in a
> public lock being held for longer than the duration of that function seems
> like a bug in the SB code -- does that make sense or am I way off?

Since we do async process control, so when a user says "step over", this might result in the process being stopped and started 100 times. Before doing any process control, we take the public process run lock by acquiring the write lock. This lets any people currently in the process with the public run lock that was acquired for reading, to get out. Once the public run lock is taken, we keep it until the process run control action ("step over") is complete. We must lock down access to the process for the entire duration because it wouldn't be a good time for someone to come in and ask for the stack frames of a thread in between run 56 and 57. It stack frames would be meaningless from a public perspective. Internally though, for any blessed sections of code (like OperatingSystemPython) which we know will need access to the public API from the private state thread, we allow them to not be locked out by using the private run lock.

> 
> Another question that arises is why does python code need to run on the
> same main internal thread?

Any thread other than the private state thread would use the public lock. Any code run from the private state thread itself, since it controls the process, can be guaranteed that the process is and will stay stopped while accessing the data. So just code run from the private state thread is special for anything that goes out to any code that would need to use the public SB API.

> I would see SB* callers as users of the Process
> class and as such they should only be allowed to touch the "public" state
> and never run on the internal thread. I would imagine that LLDB would
> spawn new threads for callbacks instead of reusing internal monitoring
> threads.
If we did this, we would end up locking ourselves out from being able to use the code. We don't want the people getting in and being able to do stuff on the process in between stops when we are implementing a "step over". If we make new threads for callbacks, then we would either be locked out from accessing the process, or we would need to let clients access the process in between intermediate steps of the current process run control, which is not what anyone wants. (if a client says "process.Continue()" immediately followed by code that gets stack frames for all threads, we don't want stack frames from the middle of a single step.

> Is it not possible to have both "driving" code and "callback"
> code go through exactly the same code path the SB* layer as well as the
> Process class?

You mean using the same locks? I am confused by your question. All code uses the same public API through the SB* layer. The SB layer locks you out when it is dangerous to do things while the process is running. Many of the issues we ran into stemmed from Xcode accessing the SB* API from multiple threads and all of the locks (the target lock and the process read/write (stopped/running) locks are needed.

> My thinking is that both of those use cases should be
> treated the same and interact with only the "public" Process state, while
> an internal thread does the lower level Process work of launching things
> and responding to events from the process, etc.

Let me know what you think after reading the above explanation. The locks are needed and necessary and we haven't been able to come up with another way to control access to the things we need to do from the various threads. We currently have two clients:
- public clients
- private process state thread clients

public clients get locked out, as they should, when the process is running (continue, or stepping) so we don't get useless answers.

Do you really want to evaluate the value for "x" when we are in the middle of a single step? No you want to get an error stating "error: process is running". 
Do you really want stack frames for all threads in a process when the process is in the middle of a single step? No you want to get an error stating "error: process is running".

The private clients though are limited to OperatingSystemPython right now, but could be expanded to breakpoint actions in the near future. A breakpoint might say it wants to have python run when the breakpoint gets hit and it would be great to not have to do this on another thread just so we can see that "x != 12", or "rax != 123". 

Another thing we have to watch out for is that there are limitations on things that can be done on the private state thread. For example you wouldn't want to try to evaluate an expression (run the target) while you are currently running the target for another expression. There is special code for this for one special case (calling "mmap" to allocate memory by manually calling a function without using the expression parser), but those don't use any of the special locking we have implemented.

> Another part that complicates reviewing this code is the external locking
> -- wouldn't it be way better to encapsulate all the locking/unlocking
> inside the Process class? I like code that keeps the lock code close to
> the data they protect; what's the benefit of having SB* API classes
> acquire and maintain internal Process locks?

You might need to do multiple things. Like to get a backtrace for all threads. Do you want to ask the process for thread at index 0, then backtrace it. Then another thread resumes the process and it stops somewhere else. Now you ask for thread at index 1 and you get a different thread because the thread list has changed. Not a great experience. This is why we externally lock.

> 
> I agree completely about avoiding Boost-bloat; the concern I had there was
> that there's no tests for any of the R/W locking support classes.. I
> concede that on further studies of the Boost APIs, they won't really solve
> the double-locking or double-unlocking problem you're trying to solve --
> although I'd prefer to see this whole Process locking thing redesigned
> with internal rather than external locking. I think this will help nail
> down exactly what cases are causing the double-locking thing to deadlock.

This works great for single access functions, but falls down for complex actions (like backtracking all threads).

We are open to any ideas you have for robust locking. We have tossed around a few ideas for the process control, but haven't come up with one that works better than what we have. Having a multi-threaded API sure does complicate things and is something other debuggers don't have to worry about.
> 
> 
> 
> Thanks for trying to get a Linux machine up and running; we definitely
> appreciate it!!
> 
> I have never tried getting a build working in Parallels though, so you
> might have a better time with Bootcamp. The segfault during linking
> definitely sounds like an out-of-memory situation. As for Ubuntu, 12.04 is
> a little out-dated now; I recommend using 12.10 as that has GCC 4.7 which
> is compatible with both configure and cmake-based builds.

Nice! I will install that and give it a go!

> If you do need
> to use 12.04 and you're stuck on GCC 4.6, I recommend using cmake instead
> of configure as that system is smart enough to pass the right C++11 flag
> (-std=c++0x) to older versions of GCC as opposed to ./configure
> --enable-cxx11 which blindly passes in (-std=c++11). Also, cmake+ninja
> builds LLDB much faster (by about 1min) than the configure based system.

Thanks for the info. I will let you know how 12.10 works out.
> 
> As Ashok mentioned, if you'd like to use clang, llvm.org/apt is the place
> to get bleeding edge packages for Ubuntu and Debian. Personally, I use
> both GCC and Clang to keep things sane across the board. Both buildbots
> seem to build the project right now.a

If 12.10 fails, I will try clang.

Let me know what you think of the new explanations above and if you have any ideas on controlling the API in a different better way.

Greg

> 
> Cheers,
> Dan
> 
> 
> 
> 
> On 2013-04-17 8:25 PM, "Greg Clayton" <gclayton at apple.com> wrote:
> 
>> We currently need to avoid doing things while the process is running.
>> There are two cases we care about:
>> - the public state tracking when we are running
>> - the private state tracking when we are running
>> 
>> The main reason we need this is the private process state thread handles
>> some complex things for us when it is handling the process. One example
>> is the OperatingSystemPlugins (like OperatingSystemPython) where it may
>> get called from the private process state thread to update the thread
>> list. A common thing to do in the OperatingSystemPython is to read a
>> global list in the kernel that contains the thread list and follow a
>> linked list. If we run and need to determine if we should stop, we often
>> need to update our thread list. This update will happen on the private
>> process thread. So the flow goes like this:
>> 
>> The old problem was:
>> 
>> 1 - (main thread) user says "step over"
>> 2 - (main thread) initiates the process control and the public process
>> write lock is taken
>> 3 - (private process thread) run and stop after each "trace" while doing
>> the single step
>> 4 - (private process thread) updates the thread list which calls into the
>> OperatingSystemPython which wants to use the public LLDB API
>> 5 - (private process thread) goto 3 until step is done
>> 
>> The problem is step 4 fails because the OperatingSystemPython used
>> lldb::SB API's that require the public process write lock in order to
>> evaluate expressions and use anything that requires that the process is
>> stopped.
>> 
>> To get around this we introduced the private read/write process lock to
>> track when the process state thread is stopped so we can actually use the
>> public APIs. So the flow is now:
>> 
>> 1 - (main thread) user says "step over"
>> 2 - (main thread) initiates the process control and the public process
>> write lock is taken
>> 3 - (private process thread) lock private process write lock
>> 4 - (private process thread) run and stop after each "trace" while doing
>> the single step
>> 5 - (private process thread) unlock private process write lock
>> 6 - (private process thread) updates the thread list which calls into the
>> OperatingSystemPython which wants to use the public LLDB API
>> 7 - (private process thread) goto 3 until the step is done
>> 
>> This lets us use the public APIs by allowing the private process state
>> thread to lock a different lock and manage when the private state thread
>> is locked.
>> 
>> This is a problem for other things that use python during the lifetime of
>> the process. For instance, we want to eventually have some python code
>> that gets called when a process is about the resume, or just after it
>> stops. We would like to simplify the code for breakpoints that have
>> commands that get run when the breakpoint is hit (right now we defer any
>> actions until the user consumes the public stop event).
>> 
>> 
>>> You mentioned that you'll improve the R/W (scoped?) locking classes..
>>> Any
>>> reason to not use boost (or some other C++11 library) for this? If we do
>>> have to roll our own in LLDB, the lack of tests is worrisome.
>> 
>> I am not a big fan of boost as it bloats the C++ program debug info to be
>> so large that it often makes debugging the boost programs very difficult
>> due to the shear size of the debug info. Most of what we cared about from
>> boost is now in C++11. Even if we did use boost, would it actually check
>> to see if the lock was taken prior to trying to release it? The APIs on
>> read/write locks are dead simple, so I don't see this is a reason to use
>> boost.
>> 
>>> If the improvements to the R/W locker classes you've got in progress
>>> don't
>>> allow the test suite to run to completion, could you please revert
>>> 179329
>>> until we have something that allows us to run the tests? Lots of patches
>>> are backed up atm due to the LLVM practice of not committing on top of a
>>> broken trunk.
>> 
>> Yes, I am trying to get us access to a linux machine that we can all use
>> here at Apple so we can debug and fix the things we break.
>> 
>> I spent a large part of the weekend trying to get Ubuntu 12.04 (using
>> Parallels Desktop (virtualization software)) building llvm/clang/lldb so
>> that I can fix these issues. I wasn't able to get clang to build as the
>> link stage would always get killed with a signal 9. Not sure why, maybe
>> the virtualization software was running out of RAM or resources. The
>> build instructions up on the web for Linux don't actually work on a fresh
>> install of Ubuntu. I needed to install new packages for tools essentials
>> and also install gcc-4.7 and try to figure out how to get LLVM to use
>> these compilers to get things to build with C++11, otherwise the build
>> wouldn't even configure with gcc-4.6 due to the --enable-libcpp quickly
>> stating of of the options wasn't supported by the compiler.
>> 
>> So the linux builds are frustrating to try and get working, but I do want
>> everyone to know that I am trying.
>> 
>> What compiler do you build with on linux? Are there packages to install
>> for a suitable version of clang? I finally gave up after many many hours
>> of trying to get lldb to build.
>> 
>> Greg
>> 
>>> 
>>> 
>>> Dan
>>> 
>>> PS. The hanging buildbots to watch are:
>>> 
>>> http://lab.llvm.org:8011/builders/lldb-x86_64-darwin11/builds/1890
>>> http://lab.llvm.org:8011/builders/lldb-x86_64-debian-clang
>>> 
>>> http://lab.llvm.org:8011/builders/lldb-x86_64-linux
>>> 
>>> 
>>> On 2013-04-17 12:47 PM, "Greg Clayton" <gclayton at apple.com> wrote:
>>> 
>>>> 
>>>> On Apr 17, 2013, at 1:27 AM, Carlo Kok <ck at remobjects.com> wrote:
>>>> 
>>>>> I'm trying to update the Windows branch to the latest and greatest and
>>>>> found these locking issues (not sure if they're relevant for posix
>>>>> too):
>>>>> 
>>>>> When I attach a process (I only use the gdb remote) the first even I
>>>>> get is "stopped" which tries to unlock m_private_run_lock, however
>>>>> this
>>>>> one is never locked in the first place. Windows' writelock doesn't
>>>>> appreciate that; as a workaround I added a
>>>>> m_private_run_lock.WriteLock(); in Process' constructor, which seems
>>>>> to
>>>>> fix that.
>>>> 
>>>> We need to fix this better by locking the private run lock when
>>>> attaching
>>>> if all goes well.
>>>> 
>>>>> 
>>>>> The second issue occurs when when trying to cause a "Stop" when it's
>>>>> already paused on internal breakpoints; for me this is during slow
>>>>> symbol load. When happens is that the loading (which happens from
>>>>> within
>>>>> Process::ShouldBroadcastEvent) resumes it, then the process exits
>>>>> properly (triggers the ShouldBroadcastEvent again) however:
>>>>> 
>>>>> ProcessEventData::DoOnRemoval(lldb_private::Event * event_ptr)
>>>>> called by Listener::FindNextEventInternal.
>>>>> 
>>>>> The resume call is in this condition:
>>>>> if (state != eStateRunning)
>>>> 
>>>> Where is the above "if (state != eStateRunning)"?
>>>> 
>>>>> Changing that to:
>>>>> lldb::StateType state = m_process_sp->GetPrivateState();
>>>>> if (state != eStateRunning && state != eStateCrashed && state !=
>>>>> eStateDetached && state != eStateExited)
>>>> 
>>>> There are functions that indicate if the function is stopped or
>>>> running.
>>>> We should use those functions. (search for "StateIsStopped").
>>>> 
>>>>> 
>>>>> Seems to fix it, as there's no reason to try & resume a process that's
>>>>> not running in the first place (and since exiting doesn't unlock a
>>>>> process this causes a deadlock)
>>>>> 
>>>>> The last issue is this:
>>>>> void * Process::RunPrivateStateThread ()
>>>>> does : m_public_run_lock.WriteUnlock(); when it's done. The Finalize
>>>>> also unlocks that same lock, which Windows crashes on.
>>>>> commenting that out and it seems to work stable.
>>>> 
>>>> We need to build in some smarts into our Read/Write locking class to
>>>> know
>>>> if the read/write lock is taken and only unlock if the corresponding
>>>> read/write lock is locked. I will make this change today.
>>>> 
>>>>> 
>>>>> 
>>>>> Anyone see any issues in all of this? (might make sense to apply this
>>>>> to trunk too; it's never good to have unbalanced lock/unlocks)
>>>>> _______________________________________________
>>>>> lldb-dev mailing list
>>>>> lldb-dev at cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
>>>> 
>>>> _______________________________________________
>>>> lldb-dev mailing list
>>>> lldb-dev at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
>>> 
>> 
>