[Lldb-commits] [lldb] Fix a bug with cancelling "attach -w" after you have run a process previously (PR #65822)

Wed Sep 20 13:15:16 PDT 2023

jimingham wrote:

You are right, there is a racy bit here - I was also getting ~one fails per 50 runs.  

The raciness is due to the time it takes between when we send the interrupt and when we finish up the command and produce the results.  So we can end up reading from the command output before the command has written its result.

I can make this more deterministic by waiting for the process state to shift to eStateExited.   But even with that it was still a little flakey because there's still some time between when the interrupt is noticed and causes us to switch the process state and when lldb makes the command return and prints that to stdout.

Unfortunately this is a test of command-line behavior, you don't get the same problem with the SB API's directly.  So I have to do it as a "run a command-line command" test, and that makes it harder to get direct signals about when to look at the result.

If we sent "command completed" events I could wait for that, then the test would be deterministic.  That seems like a feature we should design, not jam in to fix a test problem..  

However, if I just insert a time.sleep(1) between seeing the state flip to eStateExited and looking at the results, I can run this #200 and see no failures.  That's just the time to back out of the command execution stack and write the result to stdout, so that shouldn't be as variable.  If this still ends up failing intermittently, then I'll have to cook up some kind of "command completed" event that the test can wait on.

Jim

> On Sep 20, 2023, at 11:05 AM, Jim Ingham ***@***.***> wrote:
> 
> The way the test works, we run one real process and let it exit.  Then we do "attach -w -n noone_would_use_this_name" because we don't want the second attach attempt to be able to succeed.  lldb should just stay stuck till the interrupt succeeds, there shouldn't be anything other way to get out of this.  It sounds like something else is kicking us out of the attach wait loop on these systems?
> 
> Jim
> 
> 
> 
>> On Sep 20, 2023, at 2:56 AM, David Spickett ***@***.***> wrote:
>> 
>> 
>> @DavidSpickett commented on this pull request.
>> 
>> In lldb/test/API/commands/process/attach/TestProcessAttach.py <https://github.com/llvm/llvm-project/pull/65822#discussion_r1331380724>:
>> 
>> > +            time.sleep(1)
>> +            if target.process.state == lldb.eStateAttaching:
>> +                break
>> +
>> +        self.dbg.DispatchInputInterrupt()
>> +        self.dbg.DispatchInputInterrupt()
>> +
>> +        self.out_filehandle.flush()
>> +        reader = open(self.stdout_path, "r")
>> +        results = reader.readlines()
>> +        found_result = False
>> +        for line in results:
>> +            if "Cancelled async attach" in line:
>> +                found_result = True
>> +                break
>> +        self.assertTrue(found_result, "Found async error in results")
>> This fails if results is empty.
>> 
>> ********************************
>> []
>> ********************************
>> FAIL: LLDB (/home/david.spickett/build-llvm-aarch64/bin/clang-aarch64) :: test_run_then_attach_wait_interrupt (TestProcessAttach.ProcessAttachTestCase)
>> <bound method SBProcess.Kill of SBProcess: pid = 0, state = exited, threads = 0, executable = ProcessAttach>: success
>> 
>> Restore dir to: /home/david.spickett/build-llvm-aarch64
>> ======================================================================
>> FAIL: test_run_then_attach_wait_interrupt (TestProcessAttach.ProcessAttachTestCase)
>> ----------------------------------------------------------------------
>> Traceback (most recent call last):
>>   File "/home/david.spickett/llvm-project/lldb/test/API/commands/process/attach/TestProcessAttach.py", line 195, in test_run_then_attach_wait_interrupt
>>     self.assertTrue(found_result, "Found async error in results")
>> AssertionError: False is not True : Found async error in results
>> Config=aarch64-/home/david.spickett/build-llvm-aarch64/bin/clang
>> ----------------------------------------------------------------------
>> When the test works the results contain:
>> 
>> ********************************
>> ['error: attach failed: Cancelled async attach.\n', '\n', '... Interrupted.\n']
>> ********************************
>> Running it in a loop it took ~40 runs to get a failing one.
>> 
>> I wonder if that is because the attach happens to finish a lot faster sometimes, so there's no time to cancel it? If that's down to the OS and machine load, I'm not sure how we'd make this predictable.
>> 
>> The ugly option is to say if the results are empty, pass the test and assume the other 39 runs will check for the bug.
>> 
>> —
>> Reply to this email directly, view it on GitHub <https://github.com/llvm/llvm-project/pull/65822#pullrequestreview-1635242071>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADUPVW225PO46OIV2JRC2I3X3K4WPANCNFSM6AAAAAA4RAWCQY>.
>> You are receiving this because you modified the open/close state.
>> 
> 

https://github.com/llvm/llvm-project/pull/65822