[lldb-dev] Locking issues on windows

Thu Apr 18 06:56:18 PDT 2013

Ø  With your fixes in r179378, the test suite runs to completion on one of my test machines.
This worked consistently when using ssh over a slower VPN connection.  The suite hangs consistently when working at the test machine.  I see that the Darwin buildbot also reports hangs.

-        Ashok

From: lldb-dev-bounces at cs.uiuc.edu [mailto:lldb-dev-bounces at cs.uiuc.edu] On Behalf Of Thirumurthi, Ashok
Sent: Wednesday, April 17, 2013 11:32 PM
To: lldb-dev at cs.uiuc.edu
Subject: Re: [lldb-dev] Locking issues on windows

> Are there packages to install for a suitable version of clang?

FYI Greg, you can download a freshly built packages for Debian or Ubuntu using the instructions at http://llvm.org/apt/.  Basically, for Ubuntu 12.04, you can just add the following line to /etc/apt/sources.list:

      deb http://llvm.org/apt/precise/ llvm-toolchain-precise main

and then run

      sudo apt-get install clang-3.3

That should give you a tool-chain to build llvm, clang and lldb from source.  Thanks in advance for all your efforts to get a Linux machine in operation.  Say, is this something that you plan to include in pre-commit testing when operational?

> could you please revert 179329 until we have something that allows us to run the tests?

With your fixes in r179378, the test suite runs to completion on one of my test machines.  However, both lldb buildbots for Linux continue to timeout<http://lab.llvm.org:8011/builders/lldb-x86_64-debian-clang/builds/2297/steps/test%20lldb/logs/stdio> when running the test suite.  The buildbots are clearly helpful to identify the commits that introduce new regressions.  We'll look into the buildbots more tomorrow.

Let us know if you have any concerns with reverting the two lock-related commits (if needed) until we have something more stable.

Thanks!

- Ashok

-----Original Message-----
From: lldb-dev-bounces at cs.uiuc.edu<mailto:lldb-dev-bounces at cs.uiuc.edu> [mailto:lldb-dev-bounces at cs.uiuc.edu] On Behalf Of Greg Clayton
Sent: Wednesday, April 17, 2013 8:26 PM
To: Malea, Daniel
Cc: lldb-dev at cs.uiuc.edu<mailto:lldb-dev at cs.uiuc.edu>
Subject: Re: [lldb-dev] Locking issues on windows

On Apr 17, 2013, at 4:01 PM, "Malea, Daniel" <daniel.malea at intel.com<mailto:daniel.malea at intel.com>> wrote:

> So, it looks like the locks are going awry in several places.

>

> Carlo, I can confirm that your fix resolves some of the hangs that

> everyone is experiencing but not all. Specifically, the

> TestInlineStepping.py seems to still deadlock on the acquisition of

> one of the Process (public) locks during a Resume(). That said,

> toggling the lock in the constructor doesn't seem like a sound workaround..

Agreed, this shouldn't be the fix we use. We should track when we are doing an attach and lock it when the attach starts.

> Greg,

>

> 179329 is the commit that seems to have made things go all sideways.

> After that commit, no Debian users can install a package that doesn't

> deadlock on startup, we have no visibility into the testing status on

> the buildbots, and the commit itself seems scary as it exposes a

> reference to one of two internal locks to users based on what thread they're running in.

>

> After briefly studying the Process class, I'm a little worried about

> the complexity of the design. Could you explain the reason 2 different

> R/W locks are needed? I understand why one R/W lock makes sense in the

> class, but two seem overly complicated.

We currently need to avoid doing things while the process is running. There are two cases we care about:

- the public state tracking when we are running

- the private state tracking when we are running

The main reason we need this is the private process state thread handles some complex things for us when it is handling the process. One example is the OperatingSystemPlugins (like OperatingSystemPython) where it may get called from the private process state thread to update the thread list. A common thing to do in the OperatingSystemPython is to read a global list in the kernel that contains the thread list and follow a linked list. If we run and need to determine if we should stop, we often need to update our thread list. This update will happen on the private process thread. So the flow goes like this:

The old problem was:

1 - (main thread) user says "step over"

2 - (main thread) initiates the process control and the public process write lock is taken

3 - (private process thread) run and stop after each "trace" while doing the single step

4 - (private process thread) updates the thread list which calls into the OperatingSystemPython which wants to use the public LLDB API

5 - (private process thread) goto 3 until step is done

The problem is step 4 fails because the OperatingSystemPython used lldb::SB API's that require the public process write lock in order to evaluate expressions and use anything that requires that the process is stopped.

To get around this we introduced the private read/write process lock to track when the process state thread is stopped so we can actually use the public APIs. So the flow is now:

1 - (main thread) user says "step over"

2 - (main thread) initiates the process control and the public process write lock is taken

3 - (private process thread) lock private process write lock

4 - (private process thread) run and stop after each "trace" while doing the single step

5 - (private process thread) unlock private process write lock

6 - (private process thread) updates the thread list which calls into the OperatingSystemPython which wants to use the public LLDB API

7 - (private process thread) goto 3 until the step is done

This lets us use the public APIs by allowing the private process state thread to lock a different lock and manage when the private state thread is locked.

This is a problem for other things that use python during the lifetime of the process. For instance, we want to eventually have some python code that gets called when a process is about the resume, or just after it stops. We would like to simplify the code for breakpoints that have commands that get run when the breakpoint is hit (right now we defer any actions until the user consumes the public stop event).

> You mentioned that you'll improve the R/W (scoped?) locking classes..

> Any reason to not use boost (or some other C++11 library) for this? If

> we do have to roll our own in LLDB, the lack of tests is worrisome.

I am not a big fan of boost as it bloats the C++ program debug info to be so large that it often makes debugging the boost programs very difficult due to the shear size of the debug info. Most of what we cared about from boost is now in C++11. Even if we did use boost, would it actually check to see if the lock was taken prior to trying to release it? The APIs on read/write locks are dead simple, so I don't see this is a reason to use boost.

> If the improvements to the R/W locker classes you've got in progress

> don't allow the test suite to run to completion, could you please

> revert 179329 until we have something that allows us to run the tests?

> Lots of patches are backed up atm due to the LLVM practice of not

> committing on top of a broken trunk.

Yes, I am trying to get us access to a linux machine that we can all use here at Apple so we can debug and fix the things we break.

I spent a large part of the weekend trying to get Ubuntu 12.04 (using Parallels Desktop (virtualization software)) building llvm/clang/lldb so that I can fix these issues. I wasn't able to get clang to build as the link stage would always get killed with a signal 9. Not sure why, maybe the virtualization software was running out of RAM or resources. The build instructions up on the web for Linux don't actually work on a fresh install of Ubuntu. I needed to install new packages for tools essentials and also install gcc-4.7 and try to figure out how to get LLVM to use these compilers to get things to build with C++11, otherwise the build wouldn't even configure with gcc-4.6 due to the --enable-libcpp quickly stating of of the options wasn't supported by the compiler.

So the linux builds are frustrating to try and get working, but I do want everyone to know that I am trying.

What compiler do you build with on linux? Are there packages to install for a suitable version of clang? I finally gave up after many many hours of trying to get lldb to build.

Greg

>

>

> Dan

>

> PS. The hanging buildbots to watch are:

>

> http://lab.llvm.org:8011/builders/lldb-x86_64-darwin11/builds/1890

> http://lab.llvm.org:8011/builders/lldb-x86_64-debian-clang

>

> http://lab.llvm.org:8011/builders/lldb-x86_64-linux

>

>

> On 2013-04-17 12:47 PM, "Greg Clayton" <gclayton at apple.com<mailto:gclayton at apple.com>> wrote:

>

>>

>> On Apr 17, 2013, at 1:27 AM, Carlo Kok <ck at remobjects.com<mailto:ck at remobjects.com>> wrote:

>>

>>> I'm trying to update the Windows branch to the latest and greatest

>>> and found these locking issues (not sure if they're relevant for posix too):

>>>

>>> When I attach a process (I only use the gdb remote) the first even I

>>> get is "stopped" which tries to unlock m_private_run_lock, however

>>> this one is never locked in the first place. Windows' writelock

>>> doesn't appreciate that; as a workaround I added a

>>> m_private_run_lock.WriteLock(); in Process' constructor, which seems

>>> to fix that.

>>

>> We need to fix this better by locking the private run lock when

>> attaching if all goes well.

>>

>>>

>>> The second issue occurs when when trying to cause a "Stop" when it's

>>> already paused on internal breakpoints; for me this is during slow

>>> symbol load. When happens is that the loading (which happens from

>>> within

>>> Process::ShouldBroadcastEvent) resumes it, then the process exits

>>> properly (triggers the ShouldBroadcastEvent again) however:

>>>

>>> ProcessEventData::DoOnRemoval(lldb_private::Event * event_ptr)

>>> called by Listener::FindNextEventInternal.

>>>

>>> The resume call is in this condition:

>>> if (state != eStateRunning)

>>

>> Where is the above "if (state != eStateRunning)"?

>>

>>> Changing that to:

>>> lldb::StateType state = m_process_sp->GetPrivateState(); if (state

>>> != eStateRunning && state != eStateCrashed && state !=

>>> eStateDetached && state != eStateExited)

>>

>> There are functions that indicate if the function is stopped or running.

>> We should use those functions. (search for "StateIsStopped").

>>

>>>

>>> Seems to fix it, as there's no reason to try & resume a process

>>> that's not running in the first place (and since exiting doesn't

>>> unlock a process this causes a deadlock)

>>>

>>> The last issue is this:

>>> void * Process::RunPrivateStateThread () does :

>>> m_public_run_lock.WriteUnlock(); when it's done. The Finalize also

>>> unlocks that same lock, which Windows crashes on.

>>> commenting that out and it seems to work stable.

>>

>> We need to build in some smarts into our Read/Write locking class to

>> know if the read/write lock is taken and only unlock if the

>> corresponding read/write lock is locked. I will make this change today.

>>

>>>

>>>

>>> Anyone see any issues in all of this? (might make sense to apply

>>> this to trunk too; it's never good to have unbalanced lock/unlocks)

>>> _______________________________________________

>>> lldb-dev mailing list

>>> lldb-dev at cs.uiuc.edu<mailto:lldb-dev at cs.uiuc.edu>

>>> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev

>>

>> _______________________________________________

>> lldb-dev mailing list

>> lldb-dev at cs.uiuc.edu<mailto:lldb-dev at cs.uiuc.edu>

>> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev

>

_______________________________________________

lldb-dev mailing list

lldb-dev at cs.uiuc.edu<mailto:lldb-dev at cs.uiuc.edu>

http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20130418/2ca5c9a2/attachment.html>