[Lldb-commits] [PATCH] RFC: Proposed change in the disassembly default format in lldb

Jason Molenda jmolenda at apple.com
Wed Feb 11 21:11:48 PST 2015

In r219544 (2014-10-10) I changed the default disassembly format to more closely resemble gdb's disassembly format.  After living on this format for a few months, there are obvious shortcomings with C++ and Objective-C programs and I want to try a new approach.

Originally lldb's disassembly would display the Module & Function/Symbol name on a line by itself when a new function/symbol began, on each line of assembly display the file/load address followed by opcode, operands, and comments (e.g. showing the target of a branch insn).  Branches to the same function would have a comment listing the full function name plus an offset.  Note that the addresses did not display the offset, just raw addresses, meaning you had to compare the full address of the branch target with the disassembly output to find the target of the branch.  When the branch target was in inlined code, lldb would print all of the inlined functions in the comment field (on separate lines).

In October I changed this to more closely resemble gdb's output:  Each line has the file/load address, the function name, the offset into the function ("+35"), opcode, operand, comment. Comments pointing to the same function behaved the same but inlined functions were not included.  I try to elide function argument types (e.g. from a demangled C++ name) but with templated methods it can be enormous.

This style of disassembly looks pretty good for short C function names.  Like

(lldb) disass -c 20
   0x7fff94fbe188 <mach_msg_trap>: movq   %rcx, %r10
   0x7fff94fbe18b <mach_msg_trap+3>: movl   $0x100001f, %eax
   0x7fff94fbe190 <mach_msg_trap+8>: syscall 
-> 0x7fff94fbe192 <mach_msg_trap+10>: retq   
   0x7fff94fbe193 <mach_msg_trap+11>: nop    
   0x7fff94fbe194 <mach_msg_overwrite_trap>: movq   %rcx, %r10
but as soon as you get a hefty C++ name in there, it becomes very messy:

0x107915454 <CommandObjectBreakpointList::DoExecute+68>: jne    0x1be9331                 ; CommandObjectBreakpointList::DoExecute + 113 at CommandObjectBreakpoint.cpp:1420

Or, an extreme example that I found in lldb with 30 seconds of looking (function name only) -

std::__1::function<std::__1::shared_ptr<lldb_private::TypeSummaryImpl> (lldb_private::ValueObject&)>::function<CommandObjectTypeSummary::CommandObjectTypeSummary(lldb_private::CommandInterpreter&)::'lambda'(lldb_private::ValueObject&)>

I want to go with a hybrid approach between these two styles.  When there is a new symbol, we print the full module + function name.  On each assembly line, we print the file/load address, the offset into the function in angle brackets, opcode, operand, and in the comments branches to the SAME function follow the <+36> style.  An example:

(lldb) disass
    0x107915410 <+0>:    pushq  %rbp
    0x107915411 <+1>:    movq   %rsp, %rbp
    0x107915414 <+4>:    subq   $0x170, %rsp
    0x10791541b <+11>:   movq   %rdi, -0x20(%rbp)
    0x10791541f <+15>:   movq   %rsi, -0x28(%rbp)
    0x107915423 <+19>:   movq   %rdx, -0x30(%rbp)
    0x107915427 <+23>:   movq   -0x20(%rbp), %rdx
->  0x10791542b <+27>:   movq   %rdx, %rsi
    0x10791542e <+30>:   movb   0x165(%rdx), %al
    0x107915434 <+36>:   andb   $0x1, %al
    0x107915436 <+38>:   movq   %rsi, %rdi
    0x107915439 <+41>:   movzbl %al, %esi
    0x10791543c <+44>:   movq   %rdx, -0xf8(%rbp)
    0x107915443 <+51>:   callq  0x107d87bb0               ; lldb_private::CommandObject::GetSelectedOrDummyTarget at CommandObject.cpp:1045
    0x107915448 <+56>:   movq   %rax, -0x38(%rbp)
    0x10791544c <+60>:   cmpq   $0x0, -0x38(%rbp)
    0x107915454 <+68>:   jne    0x107915481               ; <+113> at CommandObjectBreakpoint.cpp:1420
    0x10791545a <+74>:   leaq   0xf54d21(%rip), %rsi      ; "Invalid target. No current target or breakpoints."
    0x107915461 <+81>:   movq   -0x30(%rbp), %rdi
    0x107915465 <+85>:   callq  0x107d93640               ; lldb_private::CommandReturnObject::AppendError at CommandReturnObject.cpp:135
    0x10791546a <+90>:   movl   $0x1, %esi
    0x10791546f <+95>:   movq   -0x30(%rbp), %rdi
    0x107915473 <+99>:   callq  0x107d93760               ; lldb_private::CommandReturnObject::SetStatus at CommandReturnObject.cpp:172
    0x107915478 <+104>:  movb   $0x1, -0x11(%rbp)
    0x10791547c <+108>:  jmp    0x1079158bd               ; <+1197> at CommandObjectBreakpoint.cpp:1470
    0x107915481 <+113>:  movq   -0x38(%rbp), %rdi

The main drawback for this new arrangement is that you may be looking at a long series of instructions and forget the name of the function/method.  You'll need to scroll backwards to the beginning of the disassembly to find this function's names.  Minor details include doing a two-pass over the instruction list to find the maximum length of the address component and padding all the lines so the opcodes line up.  For instance,

(lldb)  disass -c 30 -n mach_msg_trap
    0x7fff94fbe188 <+0>:  movq   %rcx, %r10
    0x7fff94fbe18b <+3>:  movl   $0x100001f, %eax
    0x7fff94fbe190 <+8>:  syscall 
    0x7fff94fbe192 <+10>: retq   
    0x7fff94fbe193 <+11>: nop    

    0x7fff6a867210 <+0>:  movq   %rcx, %r10
    0x7fff6a867213 <+3>:  movl   $0x100001f, %eax
    0x7fff6a867218 <+8>:  syscall 
    0x7fff6a86721a <+10>: retq   
    0x7fff6a86721b <+11>: nop    

The disassembly format can be overridden by the 'disassembly-format' setting if people have specific preferences.  But I think this new hybrid style of disassembly will work the best as a default given the kinds of method names we see with OO languages.

Comments?  I'd like to land this in a couple days if no one feels strongly about it.




-------------- next part --------------
A non-text attachment was scrubbed...
Name: D7578.19805.patch
Type: text/x-patch
Size: 33148 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/lldb-commits/attachments/20150212/d1e5443b/attachment.bin>

More information about the lldb-commits mailing list