[llvm] [Dexter] Work around flaky LLDB DAP stackTrace response (PR #157090)

Fri Sep 5 05:23:04 PDT 2025

https://github.com/OCHyams created https://github.com/llvm/llvm-project/pull/157090

Buildbot cross-project-tests-sie-ubuntu sees sporadic test failures due to missing "stackTrace" "source" "path". The "path" field is optional for "source" according to DAP, so it's well formed. It works _most_ of the time, and doesn't consistently fail for any one test which is all strangely inconsistent.

* feature_tests/subtools/test/target_run_args_with_command.c - https://lab.llvm.org/buildbot/#/builders/181/builds/27287/steps/6/logs/stdio
* feature_tests/commands/perfect/dex_declare_address/address_after_ref.cpp - https://lab.llvm.org/buildbot/#/builders/181/builds/27321/steps/6/logs/stdio
* feature_tests/commands/perfect/command_line.c - https://lab.llvm.org/buildbot/#/builders/181/builds/27268/steps/6/logs/stdio

I can't replicate the failures locally after running the feature_tests in a loop for 3 hours, and haven't been able to work out why the "source" is sometimes missing by just looking at LLDB code.

So, instead, here is a plaster that I am hoping will improve bot consistency:

* Attempt to get the stack frames with source paths 3 times before giving up.

It would be ideal if we didn't need to do any of this. I think `_post_step_hook` could be removed if the behaviour in gh#156650 was fixed/changed.

>From 9c6de5b0c35d3b968afa9d0478bbd136c2113334 Mon Sep 17 00:00:00 2001
From: Orlando Cazalet-Hyams <orlando.hyams at sony.com>
Date: Fri, 5 Sep 2025 13:12:00 +0100
Subject: [PATCH] [Dexter] Work around flaky LLDB DAP stackTrace response

Buildbot cross-project-tests-sie-ubuntu sees sporadic test failures due to
missing "stackTrace" "source" "path". The "path" field is optional for "source"
according to DAP, so it's well formed. It works _most_ of the time, and doesn't
consistently fail for any one test which is all strangely inconsistent.

I can't replicate the failure locally after running the feature_tests in a loop
for 3 hours, and haven't been able to work out why the "source" is sometimes
missing by just looking at LLDB code.

So, instead, here is a plaster that I am hoping will improve bot consistency.

Attempt to get the stack frames with source paths 3 times before giving up.

It would be ideal if we didn't need to do any of this. I think `_post_step_hook`
could be removed if the behaviour in gh#156650 was fixed/changed.
---
 .../dexter/dex/debugger/lldb/LLDB.py          | 42 ++++++++++++++-----
 1 file changed, 31 insertions(+), 11 deletions(-)

diff --git a/cross-project-tests/debuginfo-tests/dexter/dex/debugger/lldb/LLDB.py b/cross-project-tests/debuginfo-tests/dexter/dex/debugger/lldb/LLDB.py
index fa10b4914d45c..dde2a1959c1ea 100644
--- a/cross-project-tests/debuginfo-tests/dexter/dex/debugger/lldb/LLDB.py
+++ b/cross-project-tests/debuginfo-tests/dexter/dex/debugger/lldb/LLDB.py
@@ -11,6 +11,7 @@
 import shlex
 from subprocess import CalledProcessError, check_output, STDOUT
 import sys
+import time
 
 from dex.debugger.DebuggerBase import DebuggerBase, watch_is_active
 from dex.debugger.DAP import DAP
@@ -419,20 +420,39 @@ def frames_below_main(self):
             "_start",
         ]
 
+    def _get_current_path_and_addr(self):
+        trace_req_id = self.send_message(
+            self.make_request(
+                "stackTrace", {"threadId": self._debugger_state.thread, "levels": 1}
+            )
+        )
+        trace_response = self._await_response(trace_req_id)
+        if not trace_response["success"]:
+            raise DebuggerException("failed to get stack frames")
+        stackframes = trace_response["body"]["stackFrames"]
+        path = stackframes[0]["source"]["path"]
+        addr = stackframes[0]["instructionPointerReference"]
+        return (path, addr)
+
     def _post_step_hook(self):
         """Hook to be executed after completing a step request."""
         if self._debugger_state.stopped_reason == "step":
-            trace_req_id = self.send_message(
-                self.make_request(
-                    "stackTrace", {"threadId": self._debugger_state.thread, "levels": 1}
-                )
-            )
-            trace_response = self._await_response(trace_req_id)
-            if not trace_response["success"]:
-                raise DebuggerException("failed to get stack frames")
-            stackframes = trace_response["body"]["stackFrames"]
-            path = stackframes[0]["source"]["path"]
-            addr = stackframes[0]["instructionPointerReference"]
+            # Buildbot cross-project-tests-sie-ubuntu sees sporadic test
+            # failures due to missing stackFrames[0].source.path. The "path"
+            # field is optional for "source" according to DAP, so it's not
+            # ill-formed. But it works most of the time, and doesn't
+            # consistently fail for any one test. Attempt to get the stack
+            # frames with source paths 3 times before giving up.
+            # FIXME: It would be ideal if we didn't need to do any of this.
+            # This entire function could be removed if gh#156650 gets resolved.
+            for attempt in range(1, 3):
+                try:
+                    path, addr = self._get_current_path_and_addr()
+                except KeyError as e:
+                    if attempt == 3:
+                        raise e
+                    time.sleep(0.1)
+
             if any(
                 self._debugger_state.bp_addr_map.get(self.dex_id_to_dap_id[dex_bp_id])
                 == addr