[Lldb-commits] [lldb] 1b1d981 - Revert "Revert "Add the ability to write target stop-hooks using the ScriptInterpreter.""

Wed Sep 30 14:21:42 PDT 2020

The test doesn’t seem to be flakey in the “run it a bunch of times and it will eventually fail” type flakey.  I ran the test 200 times on my machine and didn’t get a failure.

Another weird fact is that there are two ways to auto-continue from a stop-hook, either pass the —auto-continue flag when adding the stop hook, or returning False from the handle_stop.  All the failures that I have seen on the bots are the “return False” test, the auto-continue test never fails.  
That’s relevant because the failure is that “returning false from the stop hook” doesn’t cause us to continue.

Those two scenario’s differ in only two ways: 

1) In the “return False” case, we have to get the false result from the return of the callback invocation in Python through the ScriptInterpreter and back to C.  In the “auto-continue” case we’re just reading a flag in the StopHook.

2) When we go to decide whether to auto-continue or not we do:

  if (!somebody_restarted && ((hooks_ran && !should_stop) || auto_continue))
    m_process_sp->PrivateResume();

where the auto_continue case is controlled by the setting of the auto_continue flag, but the handle_stop return is computed from the handle_stop return value.  If we’re fetching the return value correctly, then I can’t see how the “return False” could be flakey and not the “auto-continue” one.

When I resubmitted this patch, I added a print of the return value from the stop hook’s handle_stop.  Even in the failing case the hook is running and reports that it is returning false from the Python side of the hook correctly.  So we’re right up to there.  After that I don’t have any visibility into why this is failing.

BTW, another obvious cause of “flakiness” is if you have uninitialized variables lying around, but there aren’t that many variables in this code and so far as I can see everything is initialized before use.

Anyway, if people don’t mind, I’ll check in a change that makes the stop hooks a little noisier by reporting the result passing through the ScriptInterpreter to RunStopHooks.  That will come to the test log, so with any luck I’ll be able to narrow down the cause of the failure that way.

Jim

> On Sep 30, 2020, at 12:01 PM, Pavel Labath <pavel at labath.sk> wrote:
> 
> On 30/09/2020 20:45, Jim Ingham wrote:
>> I also used to get e-mails when a test failed and I was on the changes list.  But I haven’t gotten any failure e-mails.  Does that only happen for some of the bots (or has that stopped working) or should I look to my filters?
>> 
>> Jim
>> 
> 
> You didn't get an email when the patch was committed, because the test
> happened to pass the first time around and only fail in some of the
> later builds. That's the problem with flaky tests -- whenever they fail
> (flake) a random person gets a breakage email for their unrelated change.
> 
> As the test flaps on and off nondeterministically, I very much doubt
> this is a problem with the incremental build. E.g. the only change in
> build http://lab.llvm.org:8011/builders/lldb-x86_64-debian/builds/18360
> was a change to the gn build files, which is a noop for regular cmake
> build. Both builds before it and after it were green.
> 
> Though it's possible, I would be surprised if this problem is limited to
> linux systems -- a more likely explanation is that the linux buildbots
> have a much higher throughput (lldb-x86_64-debian clocks 70 builds per
> day vs. 13 builds/day for lldb-cmake on green dragon), and so flaky
> tests get noticed sooner there.
> 
> pl

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-commits/attachments/20200930/36ff0235/attachment-0001.html>