[Lldb-commits] [lldb] e8dc8d6 - Add new Python API `SBCommandInterpreter::GetTranscript()` (#90703)

Mon May 20 15:49:49 PDT 2024

Author: royitaqi
Date: 2024-05-20T15:49:46-07:00
New Revision: e8dc8d614ada201e250fbf075241b2b6180943b5

URL: https://github.com/llvm/llvm-project/commit/e8dc8d614ada201e250fbf075241b2b6180943b5
DIFF: https://github.com/llvm/llvm-project/commit/e8dc8d614ada201e250fbf075241b2b6180943b5.diff

LOG: Add new Python API `SBCommandInterpreter::GetTranscript()` (#90703)

# Motivation

Currently, the user can already get the "transcript" (for "what is the
transcript", see `CommandInterpreter::SaveTranscript`). However, the
only way to obtain the transcript data as a user is to first destroy the
debugger, then read the save directory. Note that destroy-callbacks
cannot be used, because 1\ transcript data is private to the command
interpreter (see `CommandInterpreter.h`), and 2\ the writing of the
transcript is *after* the invocation of destory-callbacks (see
`Debugger::Destroy`).

So basically, there is no way to obtain the transcript:
* during the lifetime of a debugger (including the destroy-callbacks,
which often performs logging tasks, where the transcript can be useful)
* without relying on external storage

In theory, there are other ways for user to obtain transcript data
during a debugger's life cycle:
* Use Python API and intercept commands and results.
* Use CLI and record console input/output.

However, such ways rely on the client's setup and are not supported
natively by LLDB.


# Proposal

Add a new Python API `SBCommandInterpreter::GetTranscript()`.

Goals:
* It can be called at any time during the debugger's life cycle,
including in destroy-callbacks.
* It returns data in-memory.

Structured data:
* To make data processing easier, the return type is `SBStructuredData`.
See comments in code for how the data is organized.
* In the future, `SaveTranscript` can be updated to write different
formats using such data (e.g. JSON). This is probably accompanied by a
new setting (e.g. `interpreter.save-session-format`).

# Alternatives

The return type can also be `std::vector<std::pair<std::string,
SBCommandReturnObject>>`. This will make implementation easier, without
having to translate it to `SBStructuredData`. On the other hand,
`SBStructuredData` can convert to JSON easily, so it's more convenient
for user to process.

# Privacy

Both user commands and output/error in the transcript can contain
privacy data. However, as mentioned, the transcript is already available
to the user. The addition of the new API doesn't increase the level of
risk. In fact, it _lowers_ the risk of privacy data being leaked later
on, by avoiding writing such data to external storage.

Once the user (or their code) gets the transcript, it will be their
responsibility to make sure that any required privacy policies are
guaranteed.

# Tests

```
bin/llvm-lit -sv ../external/llvm-project/lldb/test/API/python_api/interpreter/TestCommandInterpreterAPI.py
```

```
bin/llvm-lit -sv ../external/llvm-project/lldb/test/API/commands/session/save/TestSessionSave.py
```

---------

Co-authored-by: Roy Shi <royshi at meta.com>
Co-authored-by: Med Ismail Bennani <ismail at bennani.ma>

Added: 
    

Modified: 
    lldb/include/lldb/API/SBCommandInterpreter.h
    lldb/include/lldb/Interpreter/CommandInterpreter.h
    lldb/source/API/SBCommandInterpreter.cpp
    lldb/source/Interpreter/CommandInterpreter.cpp
    lldb/source/Interpreter/InterpreterProperties.td
    lldb/test/API/commands/session/save/TestSessionSave.py
    lldb/test/API/python_api/interpreter/TestCommandInterpreterAPI.py
    lldb/test/API/python_api/interpreter/main.c

Removed: 
    


################################################################################
diff  --git a/lldb/include/lldb/API/SBCommandInterpreter.h b/lldb/include/lldb/API/SBCommandInterpreter.h
index ba2e049204b8e..8ac36344b3a79 100644

--- a/lldb/include/lldb/API/SBCommandInterpreter.h
+++ b/lldb/include/lldb/API/SBCommandInterpreter.h
@@ -318,6 +318,14 @@ class SBCommandInterpreter {
 
   SBStructuredData GetStatistics();
 
+  /// Returns a list of handled commands, output and error. Each element in
+  /// the list is a dictionary with the following keys/values:
+  /// - "command" (string): The command that was executed.
+  /// - "output" (string): The output of the command. Empty ("") if no output.
+  /// - "error" (string): The error of the command. Empty ("") if no error.
+  /// - "seconds" (float): The time it took to execute the command.
+  SBStructuredData GetTranscript();
+
 protected:
   friend class lldb_private::CommandPluginInterfaceImplementation;
 

diff  --git a/lldb/include/lldb/Interpreter/CommandInterpreter.h b/lldb/include/lldb/Interpreter/CommandInterpreter.h
index 70a55a77465bf..ccc30cf4f1a82 100644
--- a/lldb/include/lldb/Interpreter/CommandInterpreter.h
+++ b/lldb/include/lldb/Interpreter/CommandInterpreter.h
@@ -22,6 +22,7 @@
 #include "lldb/Utility/Log.h"
 #include "lldb/Utility/StreamString.h"
 #include "lldb/Utility/StringList.h"
+#include "lldb/Utility/StructuredData.h"
 #include "lldb/lldb-forward.h"
 #include "lldb/lldb-private.h"
 
@@ -560,6 +561,9 @@ class CommandInterpreter : public Broadcaster,
   bool GetPromptOnQuit() const;
   void SetPromptOnQuit(bool enable);
 
+  bool GetSaveTranscript() const;
+  void SetSaveTranscript(bool enable);
+
   bool GetSaveSessionOnQuit() const;
   void SetSaveSessionOnQuit(bool enable);
 
@@ -647,6 +651,7 @@ class CommandInterpreter : public Broadcaster,
   }
 
   llvm::json::Value GetStatistics();
+  const StructuredData::Array &GetTranscript() const;
 
 protected:
   friend class Debugger;
@@ -765,7 +770,20 @@ class CommandInterpreter : public Broadcaster,
   typedef llvm::StringMap<uint64_t> CommandUsageMap;
   CommandUsageMap m_command_usages;
 
+  /// Turn on settings `interpreter.save-transcript` for LLDB to populate
+  /// this stream. Otherwise this stream is empty.
   StreamString m_transcript_stream;
+
+  /// Contains a list of handled commands and their details. Each element in
+  /// the list is a dictionary with the following keys/values:
+  /// - "command" (string): The command that was executed.
+  /// - "output" (string): The output of the command. Empty ("") if no output.
+  /// - "error" (string): The error of the command. Empty ("") if no error.
+  /// - "seconds" (float): The time it took to execute the command.
+  ///
+  /// Turn on settings `interpreter.save-transcript` for LLDB to populate
+  /// this list. Otherwise this list is empty.
+  StructuredData::Array m_transcript;
 };
 
 } // namespace lldb_private

diff  --git a/lldb/source/API/SBCommandInterpreter.cpp b/lldb/source/API/SBCommandInterpreter.cpp
index 83c0951c56db6..7a35473283684 100644
--- a/lldb/source/API/SBCommandInterpreter.cpp
+++ b/lldb/source/API/SBCommandInterpreter.cpp
@@ -6,6 +6,7 @@
 //
 //===----------------------------------------------------------------------===//
 
+#include "lldb/Utility/StructuredData.h"
 #include "lldb/lldb-types.h"
 
 #include "lldb/Interpreter/CommandInterpreter.h"
@@ -571,6 +572,21 @@ SBStructuredData SBCommandInterpreter::GetStatistics() {
   return data;
 }
 
+SBStructuredData SBCommandInterpreter::GetTranscript() {
+  LLDB_INSTRUMENT_VA(this);
+
+  SBStructuredData data;
+  if (IsValid())
+    // A deep copy is performed by `std::make_shared` on the
+    // `StructuredData::Array`, via its implicitly-declared copy constructor.
+    // This ensures thread-safety between the user changing the returned
+    // `SBStructuredData` and the `CommandInterpreter` changing its internal
+    // `m_transcript`.
+    data.m_impl_up->SetObjectSP(
+        std::make_shared<StructuredData::Array>(m_opaque_ptr->GetTranscript()));
+  return data;
+}
+
 lldb::SBCommand SBCommandInterpreter::AddMultiwordCommand(const char *name,
                                                           const char *help) {
   LLDB_INSTRUMENT_VA(this, name, help);

diff  --git a/lldb/source/Interpreter/CommandInterpreter.cpp b/lldb/source/Interpreter/CommandInterpreter.cpp
index 4c58ecc3c1848..811726e30af4d 100644
--- a/lldb/source/Interpreter/CommandInterpreter.cpp
+++ b/lldb/source/Interpreter/CommandInterpreter.cpp
@@ -51,6 +51,7 @@
 #include "lldb/Utility/Log.h"
 #include "lldb/Utility/State.h"
 #include "lldb/Utility/Stream.h"
+#include "lldb/Utility/StructuredData.h"
 #include "lldb/Utility/Timer.h"
 
 #include "lldb/Host/Config.h"
@@ -161,6 +162,17 @@ void CommandInterpreter::SetPromptOnQuit(bool enable) {
   SetPropertyAtIndex(idx, enable);
 }
 
+bool CommandInterpreter::GetSaveTranscript() const {
+  const uint32_t idx = ePropertySaveTranscript;
+  return GetPropertyAtIndexAs<bool>(
+      idx, g_interpreter_properties[idx].default_uint_value != 0);
+}
+
+void CommandInterpreter::SetSaveTranscript(bool enable) {
+  const uint32_t idx = ePropertySaveTranscript;
+  SetPropertyAtIndex(idx, enable);
+}
+
 bool CommandInterpreter::GetSaveSessionOnQuit() const {
   const uint32_t idx = ePropertySaveSessionOnQuit;
   return GetPropertyAtIndexAs<bool>(
@@ -1889,7 +1901,16 @@ bool CommandInterpreter::HandleCommand(const char *command_line,
   else
     add_to_history = (lazy_add_to_history == eLazyBoolYes);
 
-  m_transcript_stream << "(lldb) " << command_line << '\n';
+  // The same `transcript_item` will be used below to add output and error of
+  // the command.
+  StructuredData::DictionarySP transcript_item;
+  if (GetSaveTranscript()) {
+    m_transcript_stream << "(lldb) " << command_line << '\n';
+
+    transcript_item = std::make_shared<StructuredData::Dictionary>();
+    transcript_item->AddStringItem("command", command_line);
+    m_transcript.AddItem(transcript_item);
+  }
 
   bool empty_command = false;
   bool comment_command = false;
@@ -1994,7 +2015,7 @@ bool CommandInterpreter::HandleCommand(const char *command_line,
   // Take care of things like setting up the history command & calling the
   // appropriate Execute method on the CommandObject, with the appropriate
   // arguments.
-
+  StatsDuration execute_time;
   if (cmd_obj != nullptr) {
     bool generate_repeat_command = add_to_history;
     // If we got here when empty_command was true, then this command is a
@@ -2035,14 +2056,24 @@ bool CommandInterpreter::HandleCommand(const char *command_line,
         log, "HandleCommand, command line after removing command name(s): '%s'",
         remainder.c_str());
 
+    ElapsedTime elapsed(execute_time);
     cmd_obj->Execute(remainder.c_str(), result);
   }
 
   LLDB_LOGF(log, "HandleCommand, command %s",
             (result.Succeeded() ? "succeeded" : "did not succeed"));
 
-  m_transcript_stream << result.GetOutputData();
-  m_transcript_stream << result.GetErrorData();
+  // To test whether or not transcript should be saved, `transcript_item` is
+  // used instead of `GetSaveTrasncript()`. This is because the latter will
+  // fail when the command is "settings set interpreter.save-transcript true".
+  if (transcript_item) {
+    m_transcript_stream << result.GetOutputData();
+    m_transcript_stream << result.GetErrorData();
+
+    transcript_item->AddStringItem("output", result.GetOutputData());
+    transcript_item->AddStringItem("error", result.GetErrorData());
+    transcript_item->AddFloatItem("seconds", execute_time.get().count());
+  }
 
   return result.Succeeded();
 }
@@ -3554,3 +3585,7 @@ llvm::json::Value CommandInterpreter::GetStatistics() {
     stats.try_emplace(command_usage.getKey(), command_usage.getValue());
   return stats;
 }
+
+const StructuredData::Array &CommandInterpreter::GetTranscript() const {
+  return m_transcript;
+}

diff  --git a/lldb/source/Interpreter/InterpreterProperties.td b/lldb/source/Interpreter/InterpreterProperties.td
index 2155ee61ccffb..a5fccbbca091c 100644
--- a/lldb/source/Interpreter/InterpreterProperties.td
+++ b/lldb/source/Interpreter/InterpreterProperties.td
@@ -9,6 +9,10 @@ let Definition = "interpreter" in {
     Global,
     DefaultTrue,
     Desc<"If true, LLDB will prompt you before quitting if there are any live processes being debugged. If false, LLDB will quit without asking in any case.">;
+  def SaveTranscript: Property<"save-transcript", "Boolean">,
+    Global,
+    DefaultFalse,
+    Desc<"If true, commands will be saved into a transcript buffer for user access.">;
   def SaveSessionOnQuit: Property<"save-session-on-quit", "Boolean">,
     Global,
     DefaultFalse,

diff  --git a/lldb/test/API/commands/session/save/TestSessionSave.py b/lldb/test/API/commands/session/save/TestSessionSave.py
index 172a764523046..98985c66010bb 100644
--- a/lldb/test/API/commands/session/save/TestSessionSave.py
+++ b/lldb/test/API/commands/session/save/TestSessionSave.py
@@ -25,6 +25,12 @@ def test_session_save(self):
         raw = ""
         interpreter = self.dbg.GetCommandInterpreter()
 
+        # Make sure "save-transcript" is on, so that all the following setings
+        # and commands are saved into the trasncript. Note that this cannot be
+        # a part of the `settings`, because this command itself won't be saved
+        # into the transcript.
+        self.runCmd("settings set interpreter.save-transcript true")
+
         settings = [
             "settings set interpreter.echo-commands true",
             "settings set interpreter.echo-comment-commands true",
@@ -95,6 +101,12 @@ def test_session_save_on_quit(self):
         raw = ""
         interpreter = self.dbg.GetCommandInterpreter()
 
+        # Make sure "save-transcript" is on, so that all the following setings
+        # and commands are saved into the trasncript. Note that this cannot be
+        # a part of the `settings`, because this command itself won't be saved
+        # into the transcript.
+        self.runCmd("settings set interpreter.save-transcript true")
+
         td = tempfile.TemporaryDirectory()
 
         settings = [

diff  --git a/lldb/test/API/python_api/interpreter/TestCommandInterpreterAPI.py b/lldb/test/API/python_api/interpreter/TestCommandInterpreterAPI.py
index 8f9fbfc255bb0..95643eef0d344 100644
--- a/lldb/test/API/python_api/interpreter/TestCommandInterpreterAPI.py
+++ b/lldb/test/API/python_api/interpreter/TestCommandInterpreterAPI.py
@@ -1,5 +1,6 @@
 """Test the SBCommandInterpreter APIs."""
 
+import json
 import lldb
 from lldbsuite.test.decorators import *
 from lldbsuite.test.lldbtest import *
@@ -15,8 +16,7 @@ def setUp(self):
         # Find the line number to break on inside main.cpp.
         self.line = line_number("main.c", "Hello world.")
 
-    def test_with_process_launch_api(self):
-        """Test the SBCommandInterpreter APIs."""
+    def buildAndCreateTarget(self):
         self.build()
         exe = self.getBuildArtifact("a.out")
 
@@ -27,6 +27,11 @@ def test_with_process_launch_api(self):
         # Retrieve the associated command interpreter from our debugger.
         ci = self.dbg.GetCommandInterpreter()
         self.assertTrue(ci, VALID_COMMAND_INTERPRETER)
+        return ci
+
+    def test_with_process_launch_api(self):
+        """Test the SBCommandInterpreter APIs."""
+        ci = self.buildAndCreateTarget()
 
         # Exercise some APIs....
 
@@ -85,3 +90,166 @@ def test_command_output(self):
         self.assertEqual(res.GetOutput(), "")
         self.assertIsNotNone(res.GetError())
         self.assertEqual(res.GetError(), "")
+
+    def getTranscriptAsPythonObject(self, ci):
+        """Retrieve the transcript and convert it into a Python object"""
+        structured_data = ci.GetTranscript()
+        self.assertTrue(structured_data.IsValid())
+
+        stream = lldb.SBStream()
+        self.assertTrue(stream)
+
+        error = structured_data.GetAsJSON(stream)
+        self.assertSuccess(error)
+
+        return json.loads(stream.GetData())
+
+    def test_structured_transcript(self):
+        """Test structured transcript generation and retrieval."""
+        ci = self.buildAndCreateTarget()
+
+        # Make sure the "save-transcript" setting is on
+        self.runCmd("settings set interpreter.save-transcript true")
+
+        # Send a few commands through the command interpreter.
+        #
+        # Using `ci.HandleCommand` because some commands will fail so that we
+        # can test the "error" field in the saved transcript.
+        res = lldb.SBCommandReturnObject()
+        ci.HandleCommand("version", res)
+        ci.HandleCommand("an-unknown-command", res)
+        ci.HandleCommand("breakpoint set -f main.c -l %d" % self.line, res)
+        ci.HandleCommand("r", res)
+        ci.HandleCommand("p a", res)
+        ci.HandleCommand("statistics dump", res)
+        total_number_of_commands = 6
+
+        # Get transcript as python object
+        transcript = self.getTranscriptAsPythonObject(ci)
+
+        # All commands should have expected fields.
+        for command in transcript:
+            self.assertIn("command", command)
+            self.assertIn("output", command)
+            self.assertIn("error", command)
+            self.assertIn("seconds", command)
+
+        # The following validates individual commands in the transcript.
+        #
+        # Notes:
+        # 1. Some of the asserts rely on the exact output format of the
+        #    commands. Hopefully we are not changing them any time soon.
+        # 2. We are removing the "seconds" field from each command, so that
+        #    some of the validations below can be easier / more readable.
+        for command in transcript:
+            del(command["seconds"])
+
+        # (lldb) version
+        self.assertEqual(transcript[0]["command"], "version")
+        self.assertIn("lldb version", transcript[0]["output"])
+        self.assertEqual(transcript[0]["error"], "")
+
+        # (lldb) an-unknown-command
+        self.assertEqual(transcript[1],
+            {
+                "command": "an-unknown-command",
+                "output": "",
+                "error": "error: 'an-unknown-command' is not a valid command.\n",
+            })
+
+        # (lldb) breakpoint set -f main.c -l <line>
+        self.assertEqual(transcript[2]["command"], "breakpoint set -f main.c -l %d" % self.line)
+        # Breakpoint 1: where = a.out`main + 29 at main.c:5:3, address = 0x0000000100000f7d
+        self.assertIn("Breakpoint 1: where = a.out`main ", transcript[2]["output"])
+        self.assertEqual(transcript[2]["error"], "")
+
+        # (lldb) r
+        self.assertEqual(transcript[3]["command"], "r")
+        # Process 25494 launched: '<path>/TestCommandInterpreterAPI.test_structured_transcript/a.out' (x86_64)
+        self.assertIn("Process", transcript[3]["output"])
+        self.assertIn("launched", transcript[3]["output"])
+        self.assertEqual(transcript[3]["error"], "")
+
+        # (lldb) p a
+        self.assertEqual(transcript[4],
+            {
+                "command": "p a",
+                "output": "(int) 123\n",
+                "error": "",
+            })
+
+        # (lldb) statistics dump
+        statistics_dump = json.loads(transcript[5]["output"])
+        # Dump result should be valid JSON
+        self.assertTrue(statistics_dump is not json.JSONDecodeError)
+        # Dump result should contain expected fields
+        self.assertIn("commands", statistics_dump)
+        self.assertIn("memory", statistics_dump)
+        self.assertIn("modules", statistics_dump)
+        self.assertIn("targets", statistics_dump)
+
+    def test_save_transcript_setting_default(self):
+        ci = self.buildAndCreateTarget()
+        res = lldb.SBCommandReturnObject()
+
+        # The setting's default value should be "false"
+        self.runCmd("settings show interpreter.save-transcript", "interpreter.save-transcript (boolean) = false\n")
+        # self.assertEqual(res.GetOutput(), )
+
+    def test_save_transcript_setting_off(self):
+        ci = self.buildAndCreateTarget()
+
+        # Make sure the setting is off
+        self.runCmd("settings set interpreter.save-transcript false")
+
+        # The transcript should be empty after running a command
+        self.runCmd("version")
+        transcript = self.getTranscriptAsPythonObject(ci)
+        self.assertEqual(transcript, [])
+
+    def test_save_transcript_setting_on(self):
+        ci = self.buildAndCreateTarget()
+        res = lldb.SBCommandReturnObject()
+
+        # Make sure the setting is on
+        self.runCmd("settings set interpreter.save-transcript true")
+
+        # The transcript should contain one item after running a command
+        self.runCmd("version")
+        transcript = self.getTranscriptAsPythonObject(ci)
+        self.assertEqual(len(transcript), 1)
+        self.assertEqual(transcript[0]["command"], "version")
+
+    def test_save_transcript_returns_copy(self):
+        """
+        Test that the returned structured data is *at least* a shallow copy.
+
+        We believe that a deep copy *is* performed in `SBCommandInterpreter::GetTranscript`.
+        However, the deep copy cannot be tested and doesn't need to be tested,
+        because there is no logic in the command interpreter to modify a
+        transcript item (representing a command) after it has been returned.
+        """
+        ci = self.buildAndCreateTarget()
+
+        # Make sure the setting is on
+        self.runCmd("settings set interpreter.save-transcript true")
+
+        # Run commands and get the transcript as structured data
+        self.runCmd("version")
+        structured_data_1 = ci.GetTranscript()
+        self.assertTrue(structured_data_1.IsValid())
+        self.assertEqual(structured_data_1.GetSize(), 1)
+        self.assertEqual(structured_data_1.GetItemAtIndex(0).GetValueForKey("command").GetStringValue(100), "version")
+
+        # Run some more commands and get the transcript as structured data again
+        self.runCmd("help")
+        structured_data_2 = ci.GetTranscript()
+        self.assertTrue(structured_data_2.IsValid())
+        self.assertEqual(structured_data_2.GetSize(), 2)
+        self.assertEqual(structured_data_2.GetItemAtIndex(0).GetValueForKey("command").GetStringValue(100), "version")
+        self.assertEqual(structured_data_2.GetItemAtIndex(1).GetValueForKey("command").GetStringValue(100), "help")
+
+        # Now, the first structured data should remain unchanged
+        self.assertTrue(structured_data_1.IsValid())
+        self.assertEqual(structured_data_1.GetSize(), 1)
+        self.assertEqual(structured_data_1.GetItemAtIndex(0).GetValueForKey("command").GetStringValue(100), "version")

diff  --git a/lldb/test/API/python_api/interpreter/main.c b/lldb/test/API/python_api/interpreter/main.c
index 277aa54a4eea5..366ffde5fdef5 100644
--- a/lldb/test/API/python_api/interpreter/main.c
+++ b/lldb/test/API/python_api/interpreter/main.c
@@ -1,6 +1,7 @@
 #include <stdio.h>
 
 int main(int argc, char const *argv[]) {
-    printf("Hello world.\n");
-    return 0;
+  int a = 123;
+  printf("Hello world.\n");
+  return 0;
 }