[Lldb-commits] [PATCH] D155107: Add support for llvm::MCInstPrinter::setPrintBranchImmAsAddress

Tue Jul 18 12:19:41 PDT 2023

clayborg added a comment.

In D155107#4510539 <https://reviews.llvm.org/D155107#4510539>, @ted wrote:

> In D155107#4504667 <https://reviews.llvm.org/D155107#4504667>, @jasonmolenda wrote:
>
>> Isn't it better to print branches within a function as an offset, given that our disassembly format by default lists the offset of each instruction.  So instead of looking for a 6-digit long hex address, you're looking for a decimal offset in the output?  I'm not sure if you disagree with my opinion, or if you have an example where it is clearer to have an address that I'm not seeing.
>
> llvm-objdump calls setPrintImmAsAddress(true). I think we should do the same. I also think that it looks better. @clayborg here is some riscv32 disassembly with and without the change:
>
> with:
>
>   (lldb) dis -n factorial
>   factrv32`factorial:
>       0x1047c <+0>:  addi   sp, sp, -32
>       0x1047e <+2>:  sw     ra, 28(sp)
>       0x10480 <+4>:  sw     s0, 24(sp)
>       0x10482 <+6>:  addi   s0, sp, 32
>       0x10484 <+8>:  sw     a0, -16(s0)
>       0x10488 <+12>: lw     a0, -16(s0)
>       0x1048c <+16>: li     a1, 1
>       0x1048e <+18>: beq    a0, a1, 0x10496
>       0x10492 <+22>: j      0x104a4
>       0x10496 <+26>: j      0x1049a
>       0x1049a <+30>: li     a0, 1
>       0x1049c <+32>: sw     a0, -12(s0)
>       0x104a0 <+36>: j      0x104c2
>       0x104a4 <+40>: lw     a0, -16(s0)
>       0x104a8 <+44>: sw     a0, -20(s0)
>       0x104ac <+48>: addi   a0, a0, -1
>       0x104ae <+50>: jal    0x1047c
>       0x104b0 <+52>: mv     a1, a0
>       0x104b2 <+54>: lw     a0, -20(s0)
>       0x104b6 <+58>: mul    a0, a0, a1
>       0x104ba <+62>: sw     a0, -12(s0)
>       0x104be <+66>: j      0x104c2
>       0x104c2 <+70>: lw     a0, -12(s0)
>       0x104c6 <+74>: lw     ra, 28(sp)
>       0x104c8 <+76>: lw     s0, 24(sp)
>       0x104ca <+78>: addi   sp, sp, 32
>       0x104cc <+80>: ret    
>
> without:
>
>   factrv32`factorial:
>       0x1047c <+0>:  addi   sp, sp, -32
>       0x1047e <+2>:  sw     ra, 28(sp)
>       0x10480 <+4>:  sw     s0, 24(sp)
>       0x10482 <+6>:  addi   s0, sp, 32
>       0x10484 <+8>:  sw     a0, -16(s0)
>       0x10488 <+12>: lw     a0, -16(s0)
>       0x1048c <+16>: li     a1, 1
>       0x1048e <+18>: beq    a0, a1, 8
>       0x10492 <+22>: j      18
>       0x10496 <+26>: j      4
>       0x1049a <+30>: li     a0, 1
>       0x1049c <+32>: sw     a0, -12(s0)
>       0x104a0 <+36>: j      34
>       0x104a4 <+40>: lw     a0, -16(s0)
>       0x104a8 <+44>: sw     a0, -20(s0)
>       0x104ac <+48>: addi   a0, a0, -1
>       0x104ae <+50>: jal    -50
>       0x104b0 <+52>: mv     a1, a0
>       0x104b2 <+54>: lw     a0, -20(s0)
>       0x104b6 <+58>: mul    a0, a0, a1
>       0x104ba <+62>: sw     a0, -12(s0)
>       0x104be <+66>: j      4
>       0x104c2 <+70>: lw     a0, -12(s0)
>       0x104c6 <+74>: lw     ra, 28(sp)
>       0x104c8 <+76>: lw     s0, 24(sp)
>       0x104ca <+78>: addi   sp, sp, 32
>       0x104cc <+80>: ret    

Definitely looks nicer with the full addresses IMHO.

> Addresses 0x10492 and 0x10496 are jumps. It's much easier, IMO, to see what the targets are with the change than without.
>
> Ultimately, I'd like to see more info for branch/jump targets, like x64 has, but that's another patch:
>
>   0x40053f <+31>: jmp    0x400563                  ; <+67> at factorial.c:10

We already do this kind of comment when we know that there is an address that is being referred to in the disassembly. Maybe the "I know this opcode has an address" is not working for RISCV? I am not sure if there is a disassembler issue for RISCV only or what is going on. Check out the current arm64 disassembly of a simple. main function from the current lldb:

  (lldb) disassemble --name main
  a.out`main:
  a.out[0x100002e98] <+0>:   sub    sp, sp, #0x70
  a.out[0x100002e9c] <+4>:   stp    x29, x30, [sp, #0x60]
  a.out[0x100002ea0] <+8>:   add    x29, sp, #0x60
  a.out[0x100002ea4] <+12>:  stur   wzr, [x29, #-0xc]
  a.out[0x100002ea8] <+16>:  stur   w0, [x29, #-0x10]
  a.out[0x100002eac] <+20>:  stur   x1, [x29, #-0x18]
  a.out[0x100002eb0] <+24>:  stur   wzr, [x29, #-0x1c]
  a.out[0x100002eb4] <+28>:  mov    w8, #0x2                  ; =2 
  a.out[0x100002eb8] <+32>:  str    w8, [sp, #0x4]
  a.out[0x100002ebc] <+36>:  stur   w8, [x29, #-0x4]
  a.out[0x100002ec0] <+40>:  mov    w8, #0x3                  ; =3 
  a.out[0x100002ec4] <+44>:  stur   w8, [x29, #-0x8]
  a.out[0x100002ec8] <+48>:  ldur   w8, [x29, #-0x4]
  a.out[0x100002ecc] <+52>:  ldur   w9, [x29, #-0x8]
  a.out[0x100002ed0] <+56>:  mul    w8, w8, w9
  a.out[0x100002ed4] <+60>:  stur   w8, [x29, #-0x20]
  a.out[0x100002ed8] <+64>:  sub    x2, x29, #0x2c
  a.out[0x100002edc] <+68>:  mov    w8, #0x1                  ; =1 
  a.out[0x100002ee0] <+72>:  stur   w8, [x29, #-0x2c]
  a.out[0x100002ee4] <+76>:  sub    x0, x29, #0x28
  a.out[0x100002ee8] <+80>:  adrp   x1, 0
  a.out[0x100002eec] <+84>:  add    x1, x1, #0xe88            ; f at main.cpp:12
  a.out[0x100002ef0] <+88>:  str    x1, [sp, #0x8]
  a.out[0x100002ef4] <+92>:  bl     0x100002f60               ; std::__1::thread::thread<void (&)(int), int, void> at thread:309
  a.out[0x100002ef8] <+96>:  ldr    w8, [sp, #0x4]
  a.out[0x100002efc] <+100>: ldr    x1, [sp, #0x8]
  a.out[0x100002f00] <+104>: add    x2, sp, #0x24
  a.out[0x100002f04] <+108>: str    w8, [sp, #0x24]
  a.out[0x100002f08] <+112>: add    x0, sp, #0x28
  a.out[0x100002f0c] <+116>: bl     0x100002f60               ; std::__1::thread::thread<void (&)(int), int, void> at thread:309
  a.out[0x100002f10] <+120>: b      0x100002f14               ; <+124> at main.cpp
  a.out[0x100002f14] <+124>: adrp   x8, 6
  a.out[0x100002f18] <+128>: ldr    w8, [x8, #0x4]
  a.out[0x100002f1c] <+132>: stur   w8, [x29, #-0xc]
  a.out[0x100002f20] <+136>: add    x0, sp, #0x28
  a.out[0x100002f24] <+140>: bl     0x100003dd4               ; symbol stub for: std::__1::thread::~thread()
  a.out[0x100002f28] <+144>: sub    x0, x29, #0x28
  a.out[0x100002f2c] <+148>: bl     0x100003dd4               ; symbol stub for: std::__1::thread::~thread()
  a.out[0x100002f30] <+152>: ldur   w0, [x29, #-0xc]
  a.out[0x100002f34] <+156>: ldp    x29, x30, [sp, #0x60]
  a.out[0x100002f38] <+160>: add    sp, sp, #0x70
  a.out[0x100002f3c] <+164>: ret    
  a.out[0x100002f40] <+168>: mov    x8, x1
  a.out[0x100002f44] <+172>: str    x0, [sp, #0x18]
  a.out[0x100002f48] <+176>: str    w8, [sp, #0x14]
  a.out[0x100002f4c] <+180>: sub    x0, x29, #0x28
  a.out[0x100002f50] <+184>: bl     0x100003dd4               ; symbol stub for: std::__1::thread::~thread()
  a.out[0x100002f54] <+188>: b      0x100002f58               ; <+192> at main.cpp:23:1
  a.out[0x100002f58] <+192>: ldr    x0, [sp, #0x18]
  a.out[0x100002f5c] <+196>: bl     0x100003d8c               ; symbol stub for: _Unwind_Resume

> BTW, this only affects backends that use the setting. x64 does not - that's the output from lldb 15.03, without any other changes.

Looks like other disassemblers already show full addresses for the branches and calls (at least arm64 does from my output above), so not sure why RISCV would require this setting, but x86_64 and arm64 wouldn't because it is already working that way??

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155107/new/

https://reviews.llvm.org/D155107