[lldb-dev] Python object lifetimes affect the reliability of tests

Thu Oct 15 11:33:55 PDT 2015

To add more evidence for this, here's a small repro:

import sys

print "sys.exc_info() = ", "Empty" if sys.exc_info() == (None, None, None)
else "Valid"
try:
    raise Exception
except Exception, e:
    print "sys.exc_info() = ", "Empty" if sys.exc_info() == (None, None,
None) else "Valid"
    pass

print "sys.exc_info() = ", "Empty" if sys.exc_info() == (None, None, None)
else "Valid"
print "e = ", "Bound" if 'e' in vars() else "Unbound"
pass

For me this prints
sys.exc_info() =  Empty
sys.exc_info() =  Valid
sys.exc_info() =  Valid
e =  Bound

On Thu, Oct 15, 2015 at 11:21 AM Zachary Turner <zturner at google.com> wrote:

> We actually do already to the self.dbg.DeleteTarget(target), and that's
> the line that's failing.  The reason it's failing is because the 'sc'
> reference is still alive, which is holding an mmap, which causes a
> mandatory file lock on Windows.
>
> The diagnostics went pretty deep into python internals, but I think we
> might have figured it out.  I don't know if this is a bug in Python, but I
> think we'd probably need to ask Guido to be sure :)
>
> As far as we can tell, what happens is that on the exceptional codepath
> (e.g the assert fails), you walk back up the stack until you get to the
> except handler.  This exception handler is in TestCase.run().  After it
> handles the exception it goes and runs teardown.  However, for some reason,
> Python is still holding a strong reference to the *traceback*, even though
> we're completely out of the finally block.  What this means is that if you
> call `sys.exc_info()` *even after you've exited the finally block, it still
> returns info about the previous exception that's not even being handled
> anymore.  I would have expected this to be gone since there's no exception
> in-fligth anymore.  So basically, Python is still holding a reference to
> the active exception, the exception holds the stack frame, the stack frame
> holds the test method, the test method has locals, one of which is a
> SymbolList, a member of which is symbol context, which has the file locked.
>
> Our best guess is that if you have something like this:
>
> def foo():
>     try:
>        # Do stuff
>     except Exception, e:
>        pass
>     # Do more stuff
>
> that if the exceptional path is executed, then both e and sys.exc_info()
> are alive *while* do more stuff is happening.  We've found two ways to
> fixthis:
>
> 1) Change to this:
> def foo():
>     try:
>        # Do stuff
>     except Exception, e:
>        pass
>     del e
>     sys.exc_clear()
>     # Do more stuff
>
> 2) Put the try / except inside a function.  When the function returns,
> sys.exc_info() is cleared.
>
> I like 2 better, but we're still testing some more to make sure this
> really fixes it 100% of the time.
>
> On Thu, Oct 15, 2015 at 10:25 AM Greg Clayton via lldb-dev <
> lldb-dev at lists.llvm.org> wrote:
>
>>
>> > On Oct 15, 2015, at 8:50 AM, Adrian McCarthy via lldb-dev <
>> lldb-dev at lists.llvm.org> wrote:
>> >
>> > I've tracked down a source of flakiness in tests on Windows to Python
>> object lifetimes and the SB interface, and I'm wondering how best to handle
>> it.
>> >
>> > Consider this portion of a test from TestTargetAPI:
>> >
>> >  def find_functions(self, exe_name):
>> >      """Exercise SBTaget.FindFunctions() API."""
>> >      exe = os.path.join(os.getcwd(), exe_name)
>> >
>> >      # Create a target by the debugger.
>> >      target = self.dbg.CreateTarget(exe)
>> >      self.assertTrue(target, VALID_TARGET)
>> >      list = target.FindFunctions('c', lldb.eFunctionNameTypeAuto)
>> >      self.assertTrue(list.GetSize() == 1)
>> >
>> >      for sc in list:
>> >          self.assertTrue(sc.GetModule().GetFileSpec().GetFilename() ==
>> exe_name)
>> >          self.assertTrue(sc.GetSymbol().GetName() == 'c')
>> >
>> > The local variables go out of scope when the function exits, but the SB
>> (C++) objects they represent aren't (always) immediately destroyed.  At
>> least some of these objects keep references to the executable module in the
>> shared module list, so when the test framework cleans up and calls
>> `SBDebugger::DeleteTarget`, the module isn't orphaned, so LLDB maintains an
>> open handle to the executable.
>>
>> Creating a target with:
>>
>>         target = self.dbg.CreateTarget(exe)
>>
>> Will give you a SBTarget object that has a strong reference to the
>> target, but the debugger still has a copy in its target list, so the
>> SBTarget isn't designed to delete the object when the target variable goes
>> out of scope. If you want the target to be deleted, you actually have to
>> call through to the debugger with:
>>
>>
>>  bool
>>  SBDebugger:DeleteTarget (lldb::SBTarget &target);
>>
>>
>> So the right way to clean up the target is:
>>
>>  self.dbg.DeleteTarget(target);
>>
>> Even though there might be code within LLDB that has a valid shared
>> pointer to the lldb_private::Target still, it calls
>> lldb_private::Target::Destroy() which clears out most instance variable
>> (the module list, the process, any plug-ins, etc).
>>
>> SBTarget objects have strong references so that they _can_ keep the
>> object alive if needed in case someone else destroys the target on another
>> thread, but they don't control the lifetime of the target.
>>
>> Other objects have weak references to the objects: SBProcess, SBThread,
>> SBFrame. If the objects are actually destroyed already, the weak pointer
>> won't be able to get a valid shared pointer to the underlying object
>> and any SB API calls on these objects will return error, none, zero,
>> etc...
>>
>> >
>> > The result of the lingering handle is that, when the next test case in
>> the test suite tries to re-build the executable, it fails because the file
>> is not writable.  (This is problematic on Windows because the file system
>> works differently in this regard than Unix derivatives.)  Every subsequent
>> case in the test suite fails.
>> >
>> > I managed to make the test work reliably by rewriting it like this:
>> >
>> >  def find_functions(self, exe_name):
>> >      """Exercise SBTaget.FindFunctions() API."""
>> >      exe = os.path.join(os.getcwd(), exe_name)
>> >
>> >      # Create a target by the debugger.
>> >      target = self.dbg.CreateTarget(exe)
>> >      self.assertTrue(target, VALID_TARGET)
>> >
>> >      try:
>> >          list = target.FindFunctions('c', lldb.eFunctionNameTypeAuto)
>> >          self.assertTrue(list.GetSize() == 1)
>> >
>> >          for sc in list:
>> >              try:
>> >
>> self.assertTrue(sc.GetModule().GetFileSpec().GetFilename() == exe_name)
>> >                  self.assertTrue(sc.GetSymbol().GetName() == 'c')
>> >              finally:
>> >                  del sc
>> >
>> >      finally:
>> >          del list
>> >
>> > The finally blocks ensure that the corresponding C++ objects are
>> destroyed, even if the function exits as a result of a Python exception
>> (e.g., if one of the assertion expressions is false and the code throws an
>> exception).  Since the objects are destroyed, the reference counts are back
>> to where they should be, and the orphaned module is closed when the target
>> is deleted.
>> >
>> > But this is ugly and maintaining it would be error prone.  Is there a
>> better way to address this?
>>
>> So you should be able to fix this by deleting the target with
>> "self.dbg.DeleteTarget(target)"
>>
>> We could change all tests over to always store any targets they create in
>> the test object itself:
>>
>> self.target = self.dbg.CreateTarget(exe)
>>
>> Then the test suite could check for the existance of "self.target" and if
>> it exists, it could call "self.dbg.DeleteTarget(self.target)" automatically
>> to avoid such issues?
>>
>>
>>
>> _______________________________________________
>> lldb-dev mailing list
>> lldb-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20151015/191d6163/attachment-0001.html>