[lldb-dev] Python object lifetimes affect the reliability of tests

Thu Oct 15 11:49:05 PDT 2015

it doesn't appear to be a bug in my implementation.  So it seems like
everyone has been experiencing this problem, but Windows it just made a big
difference because of the mandatory file locks.  From the Python
dcoumentation <https://docs.python.org/2/library/sys.html>,

This function returns a tuple of three values that give information about
the exception that is currently being handled. The information returned is
specific both to the current thread and to the current stack frame. If the
current stack frame is not handling an exception, the information is taken
from the calling stack frame, or its caller, and so on until a stack frame
is found that is handling an exception. *Here, “handling an exception” is
defined as “executing or having executed an except clause.”* For any stack
frame, only information about the most recently handled exception is
accessible.

So within a given stack frame, if an exception has been handled before you,
then every frame and every variable reachable from that frame still has a
strong reference to it.

So it sounds like our solution of moving this exception handler into a
function or calling sys.exc_clear() is the right fix.

On Thu, Oct 15, 2015 at 11:43 AM Zachary Turner <zturner at google.com> wrote:

> It's not that simple.  A test method could be holding onto arbitrary
> resources which could in theory prevent cleanup.  tests can register their
> own cleanup handlers for example, and the teardown will call back into the
> cleanup handler.  So one of those handlers could be making the assumption
> that it can clean something up even though it can't because the test method
> is still holding onto some resources.
>
> I want to find out if this is a bug in my Python implementation, because
> it seems quite strange to me that sys.exc_info(), which is documented to
> return information about the exception *currently being processed*, still
> returns something after it is finished being processed.
>
> On Thu, Oct 15, 2015 at 11:36 AM Ryan Brown <ribrdb at google.com> wrote:
>
>> Couldn't we just change DeleteTarget to make sure everything is unmapped?
>>
>> On Thu, Oct 15, 2015 at 11:34 AM Zachary Turner via lldb-dev <
>> lldb-dev at lists.llvm.org> wrote:
>>
>>> To add more evidence for this, here's a small repro:
>>>
>>> import sys
>>>
>>> print "sys.exc_info() = ", "Empty" if sys.exc_info() == (None, None,
>>> None) else "Valid"
>>> try:
>>>     raise Exception
>>> except Exception, e:
>>>     print "sys.exc_info() = ", "Empty" if sys.exc_info() == (None, None,
>>> None) else "Valid"
>>>     pass
>>>
>>> print "sys.exc_info() = ", "Empty" if sys.exc_info() == (None, None,
>>> None) else "Valid"
>>> print "e = ", "Bound" if 'e' in vars() else "Unbound"
>>> pass
>>>
>>> For me this prints
>>> sys.exc_info() =  Empty
>>> sys.exc_info() =  Valid
>>> sys.exc_info() =  Valid
>>> e =  Bound
>>>
>>> On Thu, Oct 15, 2015 at 11:21 AM Zachary Turner <zturner at google.com>
>>> wrote:
>>>
>>>> We actually do already to the self.dbg.DeleteTarget(target), and that's
>>>> the line that's failing.  The reason it's failing is because the 'sc'
>>>> reference is still alive, which is holding an mmap, which causes a
>>>> mandatory file lock on Windows.
>>>>
>>>> The diagnostics went pretty deep into python internals, but I think we
>>>> might have figured it out.  I don't know if this is a bug in Python, but I
>>>> think we'd probably need to ask Guido to be sure :)
>>>>
>>>> As far as we can tell, what happens is that on the exceptional codepath
>>>> (e.g the assert fails), you walk back up the stack until you get to the
>>>> except handler.  This exception handler is in TestCase.run().  After it
>>>> handles the exception it goes and runs teardown.  However, for some reason,
>>>> Python is still holding a strong reference to the *traceback*, even though
>>>> we're completely out of the finally block.  What this means is that if you
>>>> call `sys.exc_info()` *even after you've exited the finally block, it still
>>>> returns info about the previous exception that's not even being handled
>>>> anymore.  I would have expected this to be gone since there's no exception
>>>> in-fligth anymore.  So basically, Python is still holding a reference to
>>>> the active exception, the exception holds the stack frame, the stack frame
>>>> holds the test method, the test method has locals, one of which is a
>>>> SymbolList, a member of which is symbol context, which has the file locked.
>>>>
>>>> Our best guess is that if you have something like this:
>>>>
>>>> def foo():
>>>>     try:
>>>>        # Do stuff
>>>>     except Exception, e:
>>>>        pass
>>>>     # Do more stuff
>>>>
>>>> that if the exceptional path is executed, then both e and
>>>> sys.exc_info() are alive *while* do more stuff is happening.  We've found
>>>> two ways to fixthis:
>>>>
>>>> 1) Change to this:
>>>> def foo():
>>>>     try:
>>>>        # Do stuff
>>>>     except Exception, e:
>>>>        pass
>>>>     del e
>>>>     sys.exc_clear()
>>>>     # Do more stuff
>>>>
>>>> 2) Put the try / except inside a function.  When the function returns,
>>>> sys.exc_info() is cleared.
>>>>
>>>> I like 2 better, but we're still testing some more to make sure this
>>>> really fixes it 100% of the time.
>>>>
>>>> On Thu, Oct 15, 2015 at 10:25 AM Greg Clayton via lldb-dev <
>>>> lldb-dev at lists.llvm.org> wrote:
>>>>
>>>>>
>>>>> > On Oct 15, 2015, at 8:50 AM, Adrian McCarthy via lldb-dev <
>>>>> lldb-dev at lists.llvm.org> wrote:
>>>>> >
>>>>> > I've tracked down a source of flakiness in tests on Windows to
>>>>> Python object lifetimes and the SB interface, and I'm wondering how best to
>>>>> handle it.
>>>>> >
>>>>> > Consider this portion of a test from TestTargetAPI:
>>>>> >
>>>>> >  def find_functions(self, exe_name):
>>>>> >      """Exercise SBTaget.FindFunctions() API."""
>>>>> >      exe = os.path.join(os.getcwd(), exe_name)
>>>>> >
>>>>> >      # Create a target by the debugger.
>>>>> >      target = self.dbg.CreateTarget(exe)
>>>>> >      self.assertTrue(target, VALID_TARGET)
>>>>> >      list = target.FindFunctions('c', lldb.eFunctionNameTypeAuto)
>>>>> >      self.assertTrue(list.GetSize() == 1)
>>>>> >
>>>>> >      for sc in list:
>>>>> >          self.assertTrue(sc.GetModule().GetFileSpec().GetFilename()
>>>>> == exe_name)
>>>>> >          self.assertTrue(sc.GetSymbol().GetName() == 'c')
>>>>> >
>>>>> > The local variables go out of scope when the function exits, but the
>>>>> SB (C++) objects they represent aren't (always) immediately destroyed.  At
>>>>> least some of these objects keep references to the executable module in the
>>>>> shared module list, so when the test framework cleans up and calls
>>>>> `SBDebugger::DeleteTarget`, the module isn't orphaned, so LLDB maintains an
>>>>> open handle to the executable.
>>>>>
>>>>> Creating a target with:
>>>>>
>>>>>         target = self.dbg.CreateTarget(exe)
>>>>>
>>>>> Will give you a SBTarget object that has a strong reference to the
>>>>> target, but the debugger still has a copy in its target list, so the
>>>>> SBTarget isn't designed to delete the object when the target variable goes
>>>>> out of scope. If you want the target to be deleted, you actually have to
>>>>> call through to the debugger with:
>>>>>
>>>>>
>>>>>  bool
>>>>>  SBDebugger:DeleteTarget (lldb::SBTarget &target);
>>>>>
>>>>>
>>>>> So the right way to clean up the target is:
>>>>>
>>>>>  self.dbg.DeleteTarget(target);
>>>>>
>>>>> Even though there might be code within LLDB that has a valid shared
>>>>> pointer to the lldb_private::Target still, it calls
>>>>> lldb_private::Target::Destroy() which clears out most instance variable
>>>>> (the module list, the process, any plug-ins, etc).
>>>>>
>>>>> SBTarget objects have strong references so that they _can_ keep the
>>>>> object alive if needed in case someone else destroys the target on another
>>>>> thread, but they don't control the lifetime of the target.
>>>>>
>>>>> Other objects have weak references to the objects: SBProcess,
>>>>> SBThread, SBFrame. If the objects are actually destroyed already, the weak
>>>>> pointer won't be able to get a valid shared pointer to the underlying object
>>>>> and any SB API calls on these objects will return error, none, zero,
>>>>> etc...
>>>>>
>>>>> >
>>>>> > The result of the lingering handle is that, when the next test case
>>>>> in the test suite tries to re-build the executable, it fails because the
>>>>> file is not writable.  (This is problematic on Windows because the file
>>>>> system works differently in this regard than Unix derivatives.)  Every
>>>>> subsequent case in the test suite fails.
>>>>> >
>>>>> > I managed to make the test work reliably by rewriting it like this:
>>>>> >
>>>>> >  def find_functions(self, exe_name):
>>>>> >      """Exercise SBTaget.FindFunctions() API."""
>>>>> >      exe = os.path.join(os.getcwd(), exe_name)
>>>>> >
>>>>> >      # Create a target by the debugger.
>>>>> >      target = self.dbg.CreateTarget(exe)
>>>>> >      self.assertTrue(target, VALID_TARGET)
>>>>> >
>>>>> >      try:
>>>>> >          list = target.FindFunctions('c', lldb.eFunctionNameTypeAuto)
>>>>> >          self.assertTrue(list.GetSize() == 1)
>>>>> >
>>>>> >          for sc in list:
>>>>> >              try:
>>>>> >
>>>>> self.assertTrue(sc.GetModule().GetFileSpec().GetFilename() == exe_name)
>>>>> >                  self.assertTrue(sc.GetSymbol().GetName() == 'c')
>>>>> >              finally:
>>>>> >                  del sc
>>>>> >
>>>>> >      finally:
>>>>> >          del list
>>>>> >
>>>>> > The finally blocks ensure that the corresponding C++ objects are
>>>>> destroyed, even if the function exits as a result of a Python exception
>>>>> (e.g., if one of the assertion expressions is false and the code throws an
>>>>> exception).  Since the objects are destroyed, the reference counts are back
>>>>> to where they should be, and the orphaned module is closed when the target
>>>>> is deleted.
>>>>> >
>>>>> > But this is ugly and maintaining it would be error prone.  Is there
>>>>> a better way to address this?
>>>>>
>>>>> So you should be able to fix this by deleting the target with
>>>>> "self.dbg.DeleteTarget(target)"
>>>>>
>>>>> We could change all tests over to always store any targets they create
>>>>> in the test object itself:
>>>>>
>>>>> self.target = self.dbg.CreateTarget(exe)
>>>>>
>>>>> Then the test suite could check for the existance of "self.target" and
>>>>> if it exists, it could call "self.dbg.DeleteTarget(self.target)"
>>>>> automatically to avoid such issues?
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> lldb-dev mailing list
>>>>> lldb-dev at lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>>>>>
>>>> _______________________________________________
>>> lldb-dev mailing list
>>> lldb-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20151015/24e57d13/attachment-0001.html>