[Lldb-commits] [lldb] Fix flake in TestZerothFrame.py (PR #96685)

Kendal Harland via lldb-commits lldb-commits at lists.llvm.org
Tue Jun 25 12:26:30 PDT 2024


https://github.com/kendalharland created https://github.com/llvm/llvm-project/pull/96685

This test is currently flaky on a local Windows amd64 build.

If we print lldb's inputs and outputs while running, we can see that the breakpoints are always being set correctly, and always being hit:

```sh
runCmd: breakpoint set -f "main.c" -l 2
output: Breakpoint 1: where = a.out`func_inner + 1 at main.c:2:9, address = 0x0000000140001001

runCmd: breakpoint set -f "main.c" -l 7
output: Breakpoint 2: where = a.out`main + 17 at main.c:7:5, address = 0x0000000140001021

runCmd: run
output: Process 52328 launched: 'C:\workspace\llvm-project\llvm\build\lldb-test-build.noindex\functionalities\unwind\zeroth_frame\TestZerothFrame.test_dwarf\a.out' (x86_64)
Process 52328 stopped
* thread #1, stop reason = breakpoint 1.1
    frame #0: 0x00007ff68f6b1001 a.out`func_inner at main.c:2:9
   1    void func_inner() {
-> 2        int a = 1;  // Set breakpoint 1 here
                ^
   3    }
   4
   5    int main() {
   6        func_inner();
   7        return 0; // Set breakpoint 2 here
```

However, sometimes the backtrace printed in this test shows that the process is stopped inside NtWaitForWorkViaWorkerFactory from `ntdll.dll`:

```sh
Backtrace at the first breakpoint:
frame #0: 0x00007ffecc7b3bf4 ntdll.dll`NtWaitForWorkViaWorkerFactory + 20
frame #1: 0x00007ffecc74585e ntdll.dll`RtlClearThreadWorkOnBehalfTicket + 862
frame #2: 0x00007ffecc3e257d kernel32.dll`BaseThreadInitThunk + 29
frame #3: 0x00007ffecc76af28 ntdll.dll`RtlUserThreadStart + 40
```

If we print the list of threads each time the test is run, we notice that threads are sometimes in a different order, within `process.threads`:

```sh
Thread 0: thread #4: tid = 0x9c38, 0x00007ffecc7b3bf4 ntdll.dll`NtWaitForWorkViaWorkerFactory + 20
Thread 1: thread #2: tid = 0xa950, 0x00007ffecc7b3bf4 ntdll.dll`NtWaitForWorkViaWorkerFactory + 20
Thread 2: thread #1: tid = 0xab18, 0x00007ff64bc81001 a.out`func_inner at main.c:2:9, stop reason = breakpoint 1.1
Thread 3: thread #3: tid = 0xc514, 0x00007ffecc7b3bf4 ntdll.dll`NtWaitForWorkViaWorkerFactory + 20

Thread 0: thread #3: tid = 0x018c, 0x00007ffecc7b3bf4 ntdll.dll`NtWaitForWorkViaWorkerFactory + 20
Thread 1: thread #1: tid = 0x85c8, 0x00007ff7130c1001 a.out`func_inner at main.c:2:9, stop reason = breakpoint 1.1
Thread 2: thread #2: tid = 0xf344, 0x00007ffecc7b3bf4 ntdll.dll`NtWaitForWorkViaWorkerFactory + 20
Thread 3: thread #4: tid = 0x6a50, 0x00007ffecc7b3bf4 ntdll.dll`NtWaitForWorkViaWorkerFactory + 20
```

We're interested in whichever thread is executing `a.out`. If we print more info, we can see that this thread always has an `IndexID` of 1 regardless of where it appars in `process.threads`:

```sh
Thread 0: thread #3: tid = 0x2474, 0x00007ffecc7b3bf4 ntdll.dll`NtWaitForWorkViaWorkerFactory + 20
-- GetName:
-- GetQueueName: None
-- GetQueueID: 0
-- GetIndexId: 3
Thread 1: thread #2: tid = 0x2864, 0x00007ffecc7b3bf4 ntdll.dll`NtWaitForWorkViaWorkerFactory + 20
-- GetName:
-- GetQueueName: None
-- GetQueueID: 0
-- GetIndexId: 2
Thread 2: thread #4: tid = 0x9c90, 0x00007ffecc7b3bf4 ntdll.dll`NtWaitForWorkViaWorkerFactory + 20
-- GetName:
-- GetQueueName: None
-- GetQueueID: 0
-- GetIndexId: 4
Thread 3: thread #1: tid = 0xebbc, 0x00007ff643331001 a.out`func_inner at main.c:2:9, stop reason = breakpoint 1.1
-- GetName:
-- GetQueueName: None
-- GetQueueID: 0
-- GetIndexId: 1
```

By switching from `process.GetThreadAtIndex` to
`process.GetThreadByIndex` we consistently retrieve the correct thread.

This raises a few more questions that might lead to a different followup solution:

1. Why does our target thread always have an `IndexID` of 1? Why not 0 or any other value?
2. Why is `process.threads` non-deterministically ordered?

>From 5f23d2c48252c880a20e04f273d2d34b80758b6f Mon Sep 17 00:00:00 2001
From: kendal <kendal at thebrowser.company>
Date: Mon, 24 Jun 2024 13:42:20 -0700
Subject: [PATCH] Fix flake in TestZerothFrame.py

If we print lldb's input and output while running this
test, we can see that the breakpoints are always being
set correctly, and always being hit:

```sh
runCmd: breakpoint set -f "main.c" -l 2
output: Breakpoint 1: where = a.out`func_inner + 1 at main.c:2:9, address = 0x0000000140001001

runCmd: breakpoint set -f "main.c" -l 7
output: Breakpoint 2: where = a.out`main + 17 at main.c:7:5, address = 0x0000000140001021

runCmd: run
output: Process 52328 launched: 'C:\workspace\llvm-project\llvm\build\lldb-test-build.noindex\functionalities\unwind\zeroth_frame\TestZerothFrame.test_dwarf\a.out' (x86_64)
Process 52328 stopped
* thread #1, stop reason = breakpoint 1.1
    frame #0: 0x00007ff68f6b1001 a.out`func_inner at main.c:2:9
   1    void func_inner() {
-> 2        int a = 1;  // Set breakpoint 1 here
                ^
   3    }
   4
   5    int main() {
   6        func_inner();
   7        return 0; // Set breakpoint 2 here
```

However, sometimes the backtrace printed in this test shows that
the process is stopped is inside NtWaitForWorkViaWorkerFactory from ntdll.dll:

```sh
Backtrace at the first breakpoint:
frame #0: 0x00007ffecc7b3bf4 ntdll.dll`NtWaitForWorkViaWorkerFactory + 20
frame #1: 0x00007ffecc74585e ntdll.dll`RtlClearThreadWorkOnBehalfTicket + 862
frame #2: 0x00007ffecc3e257d kernel32.dll`BaseThreadInitThunk + 29
frame #3: 0x00007ffecc76af28 ntdll.dll`RtlUserThreadStart + 40
```

If we print the list of threads each time the test is run, we notice
that threads are sometimes in a different order, within process.threads:

```sh
Thread 0: thread #4: tid = 0x9c38, 0x00007ffecc7b3bf4 ntdll.dll`NtWaitForWorkViaWorkerFactory + 20
Thread 1: thread #2: tid = 0xa950, 0x00007ffecc7b3bf4 ntdll.dll`NtWaitForWorkViaWorkerFactory + 20
Thread 2: thread #1: tid = 0xab18, 0x00007ff64bc81001 a.out`func_inner at main.c:2:9, stop reason = breakpoint 1.1
Thread 3: thread #3: tid = 0xc514, 0x00007ffecc7b3bf4 ntdll.dll`NtWaitForWorkViaWorkerFactory + 20

Thread 0: thread #3: tid = 0x018c, 0x00007ffecc7b3bf4 ntdll.dll`NtWaitForWorkViaWorkerFactory + 20
Thread 1: thread #1: tid = 0x85c8, 0x00007ff7130c1001 a.out`func_inner at main.c:2:9, stop reason = breakpoint 1.1
Thread 2: thread #2: tid = 0xf344, 0x00007ffecc7b3bf4 ntdll.dll`NtWaitForWorkViaWorkerFactory + 20
Thread 3: thread #4: tid = 0x6a50, 0x00007ffecc7b3bf4 ntdll.dll`NtWaitForWorkViaWorkerFactory + 20
```

We're interested in whichever thread is executing a.out. If we print
more info, we can see that this thread always has an IndexID of 1
regardless of where it appars in process.threads:

```sh
Thread 0: thread #3: tid = 0x2474, 0x00007ffecc7b3bf4 ntdll.dll`NtWaitForWorkViaWorkerFactory + 20
-- GetName:
-- GetQueueName: None
-- GetQueueID: 0
-- GetIndexId: 3
Thread 1: thread #2: tid = 0x2864, 0x00007ffecc7b3bf4 ntdll.dll`NtWaitForWorkViaWorkerFactory + 20
-- GetName:
-- GetQueueName: None
-- GetQueueID: 0
-- GetIndexId: 2
Thread 2: thread #4: tid = 0x9c90, 0x00007ffecc7b3bf4 ntdll.dll`NtWaitForWorkViaWorkerFactory + 20
-- GetName:
-- GetQueueName: None
-- GetQueueID: 0
-- GetIndexId: 4
Thread 3: thread #1: tid = 0xebbc, 0x00007ff643331001 a.out`func_inner at main.c:2:9, stop reason = breakpoint 1.1
-- GetName:
-- GetQueueName: None
-- GetQueueID: 0
-- GetIndexId: 1
```

By switching from `process.GetThreadAtIndex` to
`process.GetThreadByIndex` we consistently retrieve the correct thread.

This raises a few more questions that might lead to a different
followup solution:

1. Why does our target thread always have an IndexID of 1? Why not 0? Or
   any other value.
2. Why are process.threads non-deterministically ordered?
---
 .../functionalities/unwind/zeroth_frame/TestZerothFrame.py | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/lldb/test/API/functionalities/unwind/zeroth_frame/TestZerothFrame.py b/lldb/test/API/functionalities/unwind/zeroth_frame/TestZerothFrame.py
index f4e883d314644..7e4078bbe887f 100644
--- a/lldb/test/API/functionalities/unwind/zeroth_frame/TestZerothFrame.py
+++ b/lldb/test/API/functionalities/unwind/zeroth_frame/TestZerothFrame.py
@@ -53,14 +53,15 @@ def test(self):
         process = target.LaunchSimple(None, None, self.get_process_working_directory())
         self.assertTrue(process, VALID_PROCESS)
 
-        thread = process.GetThreadAtIndex(0)
+        thread = process.GetThreadByIndexID(1)
         if self.TraceOn():
             print("Backtrace at the first breakpoint:")
             for f in thread.frames:
                 print(f)
+
         # Check that we have stopped at correct breakpoint.
         self.assertEqual(
-            process.GetThreadAtIndex(0).frame[0].GetLineEntry().GetLine(),
+            thread.frame[0].GetLineEntry().GetLine(),
             bp1_line,
             "LLDB reported incorrect line number.",
         )
@@ -70,7 +71,7 @@ def test(self):
         # 'continue' command.
         process.Continue()
 
-        thread = process.GetThreadAtIndex(0)
+        thread = process.GetThreadByIndexID(1)
         if self.TraceOn():
             print("Backtrace at the second breakpoint:")
             for f in thread.frames:



More information about the lldb-commits mailing list