[Lldb-commits] [lldb] 1b1d981 - Revert "Revert "Add the ability to write target stop-hooks using the ScriptInterpreter.""

Thu Oct 1 13:17:53 PDT 2020

Thanks for the info…

We’re running in sync mode, so for Continue to return before the process is all the way stopped, Process::ResumeSynchronous() must be bobbling the case where a stop event that resumes the process comes in.  The only way I can see that happening is if Process::PrivateResume can return before setting the private state to eStateRunning.  But only when something gets a chance to time out somewhere along the line.

I’ve been looking but I haven’t found anything relevant along this code path.  There is one hard-coded timeout along this path, the 5 second wait between sending the eBroadcastBitAsyncContinue with the continue packet to the gdb-remote async thread and receiving the ack back (in ProcessGDBRemote::DoResume).  But if that timed out it would write a “gdb-remote process” log message: "Resume timed out”.  The process log is on, I see other output from it, but I don’t see that message in the failure transcript.

Weird.

Jim

> On Oct 1, 2020, at 5:26 AM, Raphael “Teemperor” Isemann <teemperor at gmail.com> wrote:
> 
> +1, I have two machines with very similar setup where only the one that is under heavy load sees the test failures.
> 
> - Raphael
> 
>> On 1 Oct 2020, at 14:24, Pavel Labath <pavel at labath.sk> wrote:
>> 
>> On 30/09/2020 23:21, Jim Ingham wrote:
>>> The test doesn’t seem to be flakey in the “run it a bunch of times and
>>> it will eventually fail” type flakey.  I ran the test 200 times on my
>>> machine and didn’t get a failure.
>> 
>> Actually, it seems like exactly the typical kind of flaky test to me --
>> it mostly works when run on its own, but starts failing as soon as the
>> system comes under load.
>> 
>> It didn't fail for me either for over 100 iterations. However, as soon
>> as I cranked up the cpu load (compiling llvm is good at that), it failed
>> almost immediately.
>> 
>> It also doesn't seem to be related to the way the stop hook resumes the
>> process.
>> <http://lab.llvm.org:8011/builders/lldb-aarch64-ubuntu/builds/9516/steps/test/logs/stdio>
>> is one example where the auto_continue version of the test fails, and I
>> have seen both tests fail on my machine.
>> 
>> I have some traces of failing and successful runs of the test (will send
>> them to you in a private email). I didn't dive too deeply, but the
>> problem does not seem to be related to python stop hooks. It looks more
>> like a general stop hook bug, which we've had problems with in the past.
>> 
>> The problems seems to be that the process.Continue() on the main thread
>> returns early, and so the subsequent checks (for the topmost frame etc.)
>> execute concurrently with the "step out" action. In the "Failure" file
>> I'll send you you can see that (line 9222) SBFrame::GetFunctionName is
>> called before the inferior process stops in the main function (the
>> processing of that happens immediately after that line, on the
>> "intern-state" thread).
>> 
>> pl
>