[Lldb-commits] [lldb] [lldb][Docs] Additions to debuging LLDB page (PR #65635)
David Spickett via lldb-commits
lldb-commits at lists.llvm.org
Thu Sep 7 09:55:01 PDT 2023
https://github.com/DavidSpickett updated https://github.com/llvm/llvm-project/pull/65635:
>From 7c511c4beb3258894a5b9ceb884b5469b00368c0 Mon Sep 17 00:00:00 2001
From: David Spickett <david.spickett at linaro.org>
Date: Thu, 7 Sep 2023 11:05:36 +0100
Subject: [PATCH 1/2] [lldb][Docs] Additions to debuging LLDB page
Adds the following:
* A note that you can use attaching to debug the
right lldb-server process, though there are drawbacks.
* A section on debugging the remote protocol.
* Reducing bugs, including reducing ptrace bugs to
remove the need for LLDB.
I've added a standlone ptrace program to the examples
folder because:
* There's no better place to put it.
* Adding it to the page seems like wasting space,
and would be harder to update.
* I link to Eli Bendersky's classic blog on the
subject, but we are safer with our own example
as well.
* Eli's example is for 32 bit Intel, AArch64 is
more common these days.
* It's easier to show the software breakpoint
steps in code than explain it (though I still
do that in the text).
* It was living on my laptop not helping anyone
so I think it's good to have it upstream
for others, including future me.
---
lldb/docs/resources/debugging.rst | 303 ++++++++++++++++++++++++++++++
lldb/examples/ptrace_example.c | 106 +++++++++++
2 files changed, 409 insertions(+)
create mode 100644 lldb/examples/ptrace_example.c
diff --git a/lldb/docs/resources/debugging.rst b/lldb/docs/resources/debugging.rst
index 0cd310e079c23f8..e5dd2cfb054ff35 100644
--- a/lldb/docs/resources/debugging.rst
+++ b/lldb/docs/resources/debugging.rst
@@ -195,6 +195,11 @@ automatically debug the ``gdbserver`` process as it's created. However this
author has not been able to get either to work in this scenario so we suggest
making a more specific command wherever possible instead.
+Another option is to let ``lldb-server`` start up, then attach to the process
+that's interesting to you. It's less automated and won't work if the bug occurs
+during startup. However it is a good way to know you've found the right one,
+then you can take its command line and run that directly.
+
Output From ``lldb-server``
***************************
@@ -258,3 +263,301 @@ then ``lldb B`` to trigger ``lldb-server B`` to go into that code and hit the
breakpoint. ``lldb-server A`` is only here to let us debug ``lldb-server B``
remotely.
+Debugging The Remote Protocol
+-----------------------------
+
+LLDB mostly follows the `GDB Remote Protocol <https://sourceware.org/gdb/onlinedocs/gdb/Remote-Protocol.html>`_
+. Where there are differences it tries to handle both LLDB and GDB behaviour.
+
+LLDB does have extensions to the protocol which are documented in
+`lldb-gdb-remote.txt <https://github.com/llvm/llvm-project/blob/main/lldb/docs/lldb-gdb-remote.txt>`_
+and `lldb/docs/lldb-platform-packets.txt <https://github.com/llvm/llvm-project/blob/main/lldb/docs/lldb-platform-packets.txt>`_.
+
+Logging Packets
+***************
+
+If you just want to observe packets, you can enable the ``gdb-remote packets``
+log channel.
+
+::
+
+ (lldb) log enable gdb-remote packets
+ (lldb) run
+ lldb < 1> send packet: +
+ lldb history[1] tid=0x264bfd < 1> send packet: +
+ lldb < 19> send packet: $QStartNoAckMode#b0
+ lldb < 1> read packet: +
+
+You can do this on the ``lldb-server`` end as well by passing the option
+``--log-channels "gdb-remote packets"``. Then you'll see both sides of the
+connection.
+
+Some packets may be printed in a nicer way than others. For example XML packets
+will print the literal XML, some binary packets may be decoded. Others will just
+be printed unmodified. So do check what format you expect, a common one is hex
+encoded bytes.
+
+You can enable this logging even when you are connecting to an ``lldb-server``
+in platform mode, this protocol is used for that too.
+
+Debugging Packet Exchanges
+**************************
+
+Say you want to make ``lldb`` send a packet to ``lldb-server``, then debug
+how the latter builds its response. Maybe even see how ``lldb`` handles it once
+it's sent back.
+
+That all takes time, so LLDB will likely time out and think the remote has gone
+away. You can change the ``plugin.process.gdb-remote.packet-timeout`` setting
+to prevent this.
+
+Here's an example, first we'll start an ``lldb-server`` being debugged by
+``lldb``. Placing a breakpoint on a packet handler we know will be hit once
+another ``lldb`` connects.
+
+::
+
+ $ lldb -- lldb-server gdbserver :1234 -- /tmp/test.o
+ <...>
+ (lldb) b GDBRemoteCommunicationServerCommon::Handle_qSupported
+ Breakpoint 1: where = <...>
+ (lldb) run
+ <...>
+
+Next we connect another ``lldb`` to this, with a timeout of 5 minutes:
+
+::
+
+ $ lldb /tmp/test.o
+ <...>
+ (lldb) settings set plugin.process.gdb-remote.packet-timeout 300
+ (lldb) gdb-remote 1234
+
+Doing so triggers the breakpoint in ``lldb-server``, bringing us back into
+``lldb``. Now we've got 5 minutes to do whatever we need before LLDB decides
+the connection has failed.
+
+::
+
+ * thread #1, name = 'lldb-server', stop reason = breakpoint 1.1
+ frame #0: 0x0000aaaaaacc6848 lldb-server<...>
+ lldb-server`lldb_private::process_gdb_remote::GDBRemoteCommunicationServerCommon::Handle_qSupported:
+ -> 0xaaaaaacc6848 <+0>: sub sp, sp, #0xc0
+ <...>
+ (lldb)
+
+Once you're done simply ``continue`` the ``lldb-server``. Back in the other
+``lldb``, the connection process will continue as normal.
+
+::
+
+ Process 2510266 stopped
+ * thread #1, name = 'test.o', stop reason = signal SIGSTOP
+ frame #0: 0x0000fffff7fcd100 ld-2.31.so`_start
+ ld-2.31.so`_start:
+ -> 0xfffff7fcd100 <+0>: mov x0, sp
+ <...>
+ (lldb)
+
+Reducing Bugs
+-------------
+
+This section covers reducing a bug that happens in LLDB itself, or where you
+suspect that LLDB causes something else to behave abnormaly.
+
+Since bugs vary wildly, the advice here is general and incomplete. Let your
+instincts guide you and don't feel the need to try everything before reporting
+an issue or asking for help. This is simply inspiration.
+
+Reduction
+*********
+
+The first step is to reduce uneeded compexity where it is cheap to do so. If
+something is easily removed or frozen to a cerain value, do so. The goal is to
+keep the failure mode the same, with fewer dependencies.
+
+This includes, but is not limited to:
+
+* Removing test cases that don't crash.
+* Replacing dynamic lookups with constant values.
+* Replace supporting functions with stubs that do nothing.
+* Moving the test case to less unqiue system. If your machine has an exotic
+ extension, try it on a readily available commodity machine.
+* Removing irrelevant parts of the test program.
+* Reproducing the issue without using the LLDB test runner.
+* Converting a remote debuging scenario into a local one.
+
+Now we hopefully have a smaller reproducer than we started with. Next we need to
+find out what components of the software stack might be failing.
+
+Some examples are listed below with suggestions for how to investigate them.
+
+* Debugger
+
+ * Use a `released version of LLDB <https://github.com/llvm/llvm-project/releases>`_.
+
+ * If on MacOS, try the system ``lldb``.
+
+ * Try GDB or any other system debugger you might have e.g. Microsoft Visual
+ Studio.
+
+* Kernel
+
+ * Start a virtual machine running a different version. ``qemu-system`` is
+ useful here.
+
+ * Try a different physical system running a different version.
+
+ * Remember that for most kernels, userspace crashing the kernel is always a
+ kernel bug. Even if the userspace program is doing something unconventional.
+ So it could be a bug in the application and the kernel.
+
+* Compiler and compiler options
+
+ * Try other versions of the same compiler or your system compiler.
+
+ * Emit older versions of DWARF info, particularly DWARFv4 to v5, some tools
+ did/do not understand the new constructs.
+
+ * Reduce optimisation options as much as possible.
+
+ * Try all the language modes e.g. C++17/20 for C++.
+
+ * Link against LLVM's libcxx if you suspect a bug involving the system C++
+ library.
+
+ * For languages other than C/C++ e.g. Rust, try making an equivalent program
+ in C/C++. LLDB tends to try to fit other languages into a C/C++ mould, so
+ porting the program can make triage and reporting much easier.
+
+* Operating system
+
+ * Use docker to try various versions of Linux.
+
+ * Use ``qemu-system`` to emulate other operating systems e.g. FreeBSD.
+
+* Architecture
+
+ * Use `QEMU user space emulation <https://www.qemu.org/docs/master/user/main.html>`_
+ to quickly test other architectures. Note that ``lldb-server`` cannot be used
+ with this as the ptrace APIs are not emulated.
+
+ * If you need to test a big endian system use QEMU to emulate s390x (user
+ space emulation for just ``lldb``, ``qemu-system`` for testing
+ ``lldb-server``).
+
+Reducing Ptrace Related Bugs
+****************************
+
+This section is written Linux specific but the same can likely be done on
+other Unix or Unix like operating systems.
+
+Sometimes you will find ``lldb-server`` doing something with ptrace that causes
+a problem. Your reproducer involves running ``lldb`` as well, this is not going
+to go over well with kernel and is generally more difficult to explain if you
+want to get help with it.
+
+If you think you can get your point across without this, no need. If you're
+pretty sure you have for example found a Linux Kernel bug, doing this greatly
+increases the chances it'll get fixed.
+
+We'll remove the LLDB dependency by making a smaller standalone program that
+does the same actions. Starting with a skeleton program that forks and debugs
+the inferior process.
+
+The program presented `here <https://eli.thegreenplace.net/2011/01/23/how-debuggers-work-part-1>`_
+is a great starting point. There is also an AArch64 specific example in
+`the LLDB examples folder <https://github.com/llvm/llvm-project/tree/main/lldb/examples/ptrace_example.c>`_.
+
+For either, you'll need to modify that to fit your architecture. An tip for this
+is to take any constants used in it, find in which function(s) they are used in
+LLDB and then you'll find the equivalent constants in the same LLDB functions
+for your architecture.
+
+Once that is running as expected we can convert ``lldb-server``'s into calls in
+this program. To get a log of those, run ``lldb-server`` with
+``--log-channels "posix ptrace"``. You'll see output like:
+
+::
+
+ $ lldb-server gdbserver :1234 --log-channels "posix ptrace" -- /tmp/test.o
+ 1694099878.829990864 <...> ptrace(16896, 2659963, 0x0000000000000000, 0x000000000000007E, 0)=0x0
+ 1694099878.830722332 <...> ptrace(16900, 2659963, 0x0000FFFFD14BF7CC, 0x0000FFFFD14BF7D0, 16)=0x0
+ 1694099878.831967115 <...> ptrace(16900, 2659963, 0x0000FFFFD14BF66C, 0x0000FFFFD14BF630, 16)=0xffffffffffffffff
+ 1694099878.831982136 <...> ptrace() failed: Invalid argument
+ Launched '/tmp/test.o' as process 2659963...
+
+Each call is logged with its parameters and its result as the ``=`` on the end.
+
+From here you will need to use a combination of the `ptrace documentation <https://man7.org/linux/man-pages/man2/ptrace.2.html>`_
+and Linux Kernel headers (``uapi/linux/ptrace.h`` mainly) to figure out what
+the calls are.
+
+The most important parameter is the first, which is the request number. In the
+example above ``16896``, which is hex ``0x4200``, is ``PTRACE_SETOPTIONS``.
+
+Luckily, you don't usually have to figure out all those early calls. Our
+skeleton program will be doing all that, successfully we hope.
+
+What you should do is record just the interesting bit to you. Let's say
+something odd is happening when you read the ``tpidr`` register (this is an
+AArch64 register, just for example purposes).
+
+First, go to the ``lldb-server`` terminal and press enter a few times to put
+some blank lines after the last logging output.
+
+Then go to your ``lldb`` and:
+
+::
+
+ (lldb) register read tpidr
+ tpidr = 0x0000fffff7fef320
+
+You'll see this from ``lldb-server``:
+
+::
+
+ <...> ptrace(16900, 2659963, 0x0000FFFFD14BF6CC, 0x0000FFFFD14BF710, 8)=0x0
+
+If you don't see that, it may be because ``lldb`` has cached it. The easiest way
+to clear that cache is to step. Remember that some registers are read every
+step, so you'll have to adjust depending on the situation.
+
+Assuming you've got that line, you would look up what ``116900`` is. This is
+``0x4204`` in hex, which is ``PTRACE_GETREGSET``. As we expected.
+
+The following parameters are not as we might expect because what we log is a bit
+different from the literal ptrace call. See your platform's definition of
+``PtraceWrapper`` for the exact form.
+
+The point of all this is that by doing a single action you can get a few
+isolated ptrace calls and you can then fill in the blanks and write
+equivalent calls in the skeleton program.
+
+The final piece of this is likely breakpoints. Assuming your bug does not
+require a hardware breakpoint, you can get software breakpoints by inserting
+a break instruction into the inferior's code at compile time. Usually by using
+an architecture specific assembly statement, as you will need to know exactly
+how many instructions to overwrite later.
+
+Doing it this way instead of exactly copying what LLDB does will save a few
+ptrace calls. The AArch64 example program shows how to do this.
+
+* The inferior contains ``BRK #0`` then ``NOP``.
+* 2 4 byte instructins means 8 bytes of data to replace, which matches the
+ minimum size you can write with ``PTRACE_POKETEXT``.
+* The inferior runs to the ``BRK``, which brings us into the debugger.
+* The debugger reads ``PC`` and writes ``NOP`` then ``NOP`` to the location
+ pointed to by ``PC``.
+* The debugger then single steps the inferior to the next instruction
+ (this is not required in this specific scenario, you could just continue but
+ it is included because this more cloesly matches what ``lldb`` does).
+* The debugger then continues the inferior.
+* The inferior exits, and the whole program exits.
+
+Using this technique you can emulate the usual "run to main, do a thing" type
+reproduction steps.
+
+Finally, that "thing" is the ptrace calls you got from the ``lldb-server`` logs.
+Add those to the debugger function and you now have a reproducer that doesn't
+need any part of LLDB.
diff --git a/lldb/examples/ptrace_example.c b/lldb/examples/ptrace_example.c
new file mode 100644
index 000000000000000..d458ad182e689c7
--- /dev/null
+++ b/lldb/examples/ptrace_example.c
@@ -0,0 +1,106 @@
+//===-- ptrace_example.c --------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include <asm/ptrace.h>
+#include <linux/elf.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <sys/prctl.h>
+#include <sys/ptrace.h>
+#include <sys/uio.h>
+#include <sys/wait.h>
+#include <unistd.h>
+
+// The demo program shows how to do basic ptrace operations without lldb
+// or lldb-server. For the purposes of experimentation or reporting bugs
+// in kernels.
+//
+// It is AArch64 Linux specific, adapt as needed.
+//
+// Expected output:
+// Before breakpoint
+// After breakpoint
+
+void inferior() {
+ if (ptrace(PTRACE_TRACEME, 0, 0, 0) < 0) {
+ perror("ptrace");
+ return;
+ }
+
+ printf("Before breakpoint\n");
+
+ // Go into debugger. Instruction replaced with nop later.
+ // We write 2 instuctions because POKETEXT works with
+ // 64 bit values and we don't want to overwrite the
+ // call to printf accidentally.
+ asm volatile("BRK #0 \n nop");
+
+ printf("After breakpoint\n");
+}
+
+void debugger(pid_t child) {
+ int wait_status;
+ // Wait until it hits the breakpoint.
+ wait(&wait_status);
+
+ while (WIFSTOPPED(wait_status)) {
+ if (WIFEXITED(wait_status)) {
+ printf("inferior exited normally\n");
+ return;
+ }
+
+ // Read general purpose registers to find the PC value.
+ struct user_pt_regs regs;
+ struct iovec io;
+ io.iov_base = ®s;
+ io.iov_len = sizeof(regs);
+ if (ptrace(PTRACE_GETREGSET, child, NT_PRSTATUS, &io) < 0) {
+ printf("getregset failed\n");
+ return;
+ }
+
+ // Replace brk #0 / nop with nop / nop by writing to memory
+ // at the current PC.
+ uint64_t replace = 0xd503201fd503201f;
+ if (ptrace(PTRACE_POKETEXT, child, regs.pc, replace) < 0) {
+ printf("replacing bkpt failed\n");
+ return;
+ }
+
+ // Single step over where the brk was.
+ if (ptrace(PTRACE_SINGLESTEP, child, 0, 0) < 0) {
+ perror("ptrace");
+ return;
+ }
+
+ // Wait for single step to be done.
+ wait(&wait_status);
+
+ // Run to completion.
+ if (ptrace(PTRACE_CONT, child, 0, 0) < 0) {
+ perror("ptrace");
+ return;
+ }
+
+ // Wait to see that the inferior exited.
+ wait(&wait_status);
+ }
+}
+
+int main() {
+ pid_t child = fork();
+
+ if (child == 0)
+ inferior();
+ else if (child > 0)
+ debugger(child);
+ else
+ return -1;
+
+ return 0;
+}
>From cdbf83973ea85a2eb997ba9c9c2fc8c01a7a86a9 Mon Sep 17 00:00:00 2001
From: David Spickett <david.spickett at linaro.org>
Date: Thu, 7 Sep 2023 17:54:26 +0100
Subject: [PATCH 2/2] Also link directly to the source of the skeleton ptrace
program.
---
lldb/docs/resources/debugging.rst | 1 +
1 file changed, 1 insertion(+)
diff --git a/lldb/docs/resources/debugging.rst b/lldb/docs/resources/debugging.rst
index e5dd2cfb054ff35..2f144d58769e44f 100644
--- a/lldb/docs/resources/debugging.rst
+++ b/lldb/docs/resources/debugging.rst
@@ -466,6 +466,7 @@ does the same actions. Starting with a skeleton program that forks and debugs
the inferior process.
The program presented `here <https://eli.thegreenplace.net/2011/01/23/how-debuggers-work-part-1>`_
+(`source <https://github.com/eliben/code-for-blog/blob/master/2011/simple_tracer.c>`_)
is a great starting point. There is also an AArch64 specific example in
`the LLDB examples folder <https://github.com/llvm/llvm-project/tree/main/lldb/examples/ptrace_example.c>`_.
More information about the lldb-commits
mailing list