[lldb-dev] Documentation

Thu Nov 10 15:05:16 PST 2011

Andreas,

I am glad you are interested in finding out more. Unfortunately there aren't a ton of other documents you can read. The one thing I can do is point out what LLDB does different from other typical debuggers.

1 - Types are converted from debug info back into correct clang types.

Most debuggers tend to make up their own internal type representation that is more geared toward how the information is represented in the debug info and also to how the debugger's expression parser will want to us that type information.

LLDB converts DWARF back into real clang types. It currently makes an AST context per module (executable, shared library, or other loadable code container) and lazily populates the AST as needed by expressions.

2 - We use the compiler as our expression parser

Because we convert all of our types into clang types, we can use clang as our expression parser. LLDB just pretends to be a precompiled header when the compiler is parsing an expression and we can answer all of the precompiled header queries for information based on the current execution context (which frame we have selected in which thread a specific process). The other benefit of this approach is if clang adds C++Ox support, we already have support for it in our expression parser just by updating to the latest version of clang. It also allows us to support any feature currently supported by clang. Other debuggers must add new runtime features or modify their expression parsers each time a new language feature is added. This also allows us to have expression local variables:

(lldb) expression
for (int i=0; i<10; i++)
  (int)printf("i = %i\n", i);

i = 0
i = 1
i = 2
i = 3
i = 4
i = 5
i = 6
i = 7
i = 8
i = 9
(lldb) 

Note that "i" here is an expression local. If you had code like:

  int main ()
  {
      int i=0;
->    return 0;
  }

And you were stopped at the "return 0;" statement, and evaluated the above multi-line expression, it is as if you added a new lexical block scope:

  int main ()
  {
      int i=0;
      {
        for (int i=0; i<10; i++)
          (int)printf("i = %i\n", i);
      }
->    return 0;
  }

3 - The expression parser can JIT the code you have for expressions

We use clang to JIT the results of expressions and runs them locally down on in the process you are debugging. We also let clang handle all of the ABI issues when it comes to calling functions which is typically a place where debuggers can also mess up. For example if you have an expression like:

(lldb) expr c4 = complex_add (c1, c2) + c3

We actually JIT up a function that takes a single "data *" as a parameter (which is easy for debuggers to figure out how to call) and we define data as:

struct data
{
   complex c1; // variable used as first arg to complex_add() function
   complex c2; // variable used as second arg to complex_add() function
   complex c3; // variable to add to result of complex_add() function
   complex result; // result of the expression (which will be "c4")
}

Then we JIT up a function

complex
$___lldb_expr (data *data_ptr)
{
    data_ptr->result = complex_add (data_ptr->c1, data_ptr->c2) + data_ptr->c3;
}

Why is this important? Becuase now none of our debugger plug-ins need to know how and where to put arguments to functions. We let clang handle the current ABI issues and let it place the variables in which registers or on the stack as needed, including dealing with the return type from functions. This keeps the debugger from being in the business of having to know the current ABI for the current target (a big source of bugs in debugger expression parsers).

4 - JIT'ed code can be used for more complete expression validation

We write our own helper functions that we JIT up and can copy into the process we are debugging. We can post process the Intermediate Representation (IR) we get after we compile an expression and put extra checks into your expressions. So for an expression like:

(lldb) expr 2 + pt_ptr->x + pt2_ptr->y

We can actually rewrite this expression to use our "void *pointer_validation(void *)" function so we would actually run:

2 + pointer_validation (pt_ptr)->x + pointer_validation (pt2_ptr)->y

And if either "pt_ptr" or "pt2_ptr" was invalid, we can stop the epxression early and let the user know that a pointer was invalid. This can help to detect issues, escpecially when a bad pointer might point to memory just before valid memory and the field access could actually put you back into valid memory.

5 - LLDB parses debug information lazily.

Many debuggers have a lot of different approaches to how they parse debug info. GDB tends to parse everything a compile unit at a time. LLDB will parse only what it needs as it needs it. If you only touch one function in a compile unit with 100 functions, we will have parse only the function and the types needed for that one function. This can help save on memory footprint.

6 - LLDB can run multiple debug sessions simultaneously:

(lldb) target create /tmp/server.exe
(lldb) breakpoint set --name main
(lldb) run
Process 1000 launched: '/tmp/server.exe' (x86_64)
...
(lldb) target create /tmp/client.exe
(lldb) breakpoint set --name main
(lldb) run
Process 1001 launched: '/tmp/client.exe' (x86_64)
...
(lldb) target list 
Current targets:
  target #0: /tmp/server.exe ( arch=x86_64-apple-darwin, platform=localhost, pid=1000, state=stopped )
* target #1: /tmp/client.exe ( arch=x86_64-apple-darwin, platform=localhost, pid=1001, state=stopped )
(lldb) target select 0
(lldb) run
(lldb) target select 1
(lldb) run

LLDB can also run binaries for different architectures from the same debugger so you could debug a local server and a remote client for a different architecture on a remote machine in the same session. 

7 - LLDB is build around plug-ins

This means no matter what you are debugging, you always have access to other plug-ins for differnet architectures. So you can use any of the supported disassemblers from any target. Below we create a x86_64 target and debug it, and we can disassemble using the ARM disassembler on a x86_64 memory

(lldb) target create /tmp/arm-compiler-on-x86_64
(lldb) breakpoint set --name main
(lldb) run
Process 1000 launched: '/tmp/server.exe' (x86_64)
(lldb) disasemble --arch armv7 --count 32 0x12020300

GDB only has disassemblers for the currently built binary inside of it and can cross disassemble.

There are many more important architectural differences, but I believe that I have outlined the important big differences above.

Greg Clayton

On Nov 10, 2011, at 9:22 AM, Andreas Donig wrote:

> Hello everybody,
> 
> I'm an undergrad student of computer science doing some research in debuggers. I've read J.B. Rosenberg's "How Debuggers Work", the GDB Internals manual and went through the documentation on LLDB's website and now I'm craving for more. I'd greatly appreciate if you'd let me know about other relevant documentation or anything else you could suggest as a reading.
> 
> Best regards
> Andreas
> _______________________________________________
> lldb-dev mailing list
> lldb-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev