[Lldb-commits] [PATCH] D155107: Add support for llvm::MCInstPrinter::setPrintBranchImmAsAddress
Greg Clayton via Phabricator via lldb-commits
lldb-commits at lists.llvm.org
Tue Jul 18 12:19:41 PDT 2023
clayborg added a comment.
In D155107#4510539 <https://reviews.llvm.org/D155107#4510539>, @ted wrote:
> In D155107#4504667 <https://reviews.llvm.org/D155107#4504667>, @jasonmolenda wrote:
>
>> Isn't it better to print branches within a function as an offset, given that our disassembly format by default lists the offset of each instruction. So instead of looking for a 6-digit long hex address, you're looking for a decimal offset in the output? I'm not sure if you disagree with my opinion, or if you have an example where it is clearer to have an address that I'm not seeing.
>
> llvm-objdump calls setPrintImmAsAddress(true). I think we should do the same. I also think that it looks better. @clayborg here is some riscv32 disassembly with and without the change:
>
> with:
>
> (lldb) dis -n factorial
> factrv32`factorial:
> 0x1047c <+0>: addi sp, sp, -32
> 0x1047e <+2>: sw ra, 28(sp)
> 0x10480 <+4>: sw s0, 24(sp)
> 0x10482 <+6>: addi s0, sp, 32
> 0x10484 <+8>: sw a0, -16(s0)
> 0x10488 <+12>: lw a0, -16(s0)
> 0x1048c <+16>: li a1, 1
> 0x1048e <+18>: beq a0, a1, 0x10496
> 0x10492 <+22>: j 0x104a4
> 0x10496 <+26>: j 0x1049a
> 0x1049a <+30>: li a0, 1
> 0x1049c <+32>: sw a0, -12(s0)
> 0x104a0 <+36>: j 0x104c2
> 0x104a4 <+40>: lw a0, -16(s0)
> 0x104a8 <+44>: sw a0, -20(s0)
> 0x104ac <+48>: addi a0, a0, -1
> 0x104ae <+50>: jal 0x1047c
> 0x104b0 <+52>: mv a1, a0
> 0x104b2 <+54>: lw a0, -20(s0)
> 0x104b6 <+58>: mul a0, a0, a1
> 0x104ba <+62>: sw a0, -12(s0)
> 0x104be <+66>: j 0x104c2
> 0x104c2 <+70>: lw a0, -12(s0)
> 0x104c6 <+74>: lw ra, 28(sp)
> 0x104c8 <+76>: lw s0, 24(sp)
> 0x104ca <+78>: addi sp, sp, 32
> 0x104cc <+80>: ret
>
> without:
>
> factrv32`factorial:
> 0x1047c <+0>: addi sp, sp, -32
> 0x1047e <+2>: sw ra, 28(sp)
> 0x10480 <+4>: sw s0, 24(sp)
> 0x10482 <+6>: addi s0, sp, 32
> 0x10484 <+8>: sw a0, -16(s0)
> 0x10488 <+12>: lw a0, -16(s0)
> 0x1048c <+16>: li a1, 1
> 0x1048e <+18>: beq a0, a1, 8
> 0x10492 <+22>: j 18
> 0x10496 <+26>: j 4
> 0x1049a <+30>: li a0, 1
> 0x1049c <+32>: sw a0, -12(s0)
> 0x104a0 <+36>: j 34
> 0x104a4 <+40>: lw a0, -16(s0)
> 0x104a8 <+44>: sw a0, -20(s0)
> 0x104ac <+48>: addi a0, a0, -1
> 0x104ae <+50>: jal -50
> 0x104b0 <+52>: mv a1, a0
> 0x104b2 <+54>: lw a0, -20(s0)
> 0x104b6 <+58>: mul a0, a0, a1
> 0x104ba <+62>: sw a0, -12(s0)
> 0x104be <+66>: j 4
> 0x104c2 <+70>: lw a0, -12(s0)
> 0x104c6 <+74>: lw ra, 28(sp)
> 0x104c8 <+76>: lw s0, 24(sp)
> 0x104ca <+78>: addi sp, sp, 32
> 0x104cc <+80>: ret
Definitely looks nicer with the full addresses IMHO.
> Addresses 0x10492 and 0x10496 are jumps. It's much easier, IMO, to see what the targets are with the change than without.
>
> Ultimately, I'd like to see more info for branch/jump targets, like x64 has, but that's another patch:
>
> 0x40053f <+31>: jmp 0x400563 ; <+67> at factorial.c:10
We already do this kind of comment when we know that there is an address that is being referred to in the disassembly. Maybe the "I know this opcode has an address" is not working for RISCV? I am not sure if there is a disassembler issue for RISCV only or what is going on. Check out the current arm64 disassembly of a simple. main function from the current lldb:
(lldb) disassemble --name main
a.out`main:
a.out[0x100002e98] <+0>: sub sp, sp, #0x70
a.out[0x100002e9c] <+4>: stp x29, x30, [sp, #0x60]
a.out[0x100002ea0] <+8>: add x29, sp, #0x60
a.out[0x100002ea4] <+12>: stur wzr, [x29, #-0xc]
a.out[0x100002ea8] <+16>: stur w0, [x29, #-0x10]
a.out[0x100002eac] <+20>: stur x1, [x29, #-0x18]
a.out[0x100002eb0] <+24>: stur wzr, [x29, #-0x1c]
a.out[0x100002eb4] <+28>: mov w8, #0x2 ; =2
a.out[0x100002eb8] <+32>: str w8, [sp, #0x4]
a.out[0x100002ebc] <+36>: stur w8, [x29, #-0x4]
a.out[0x100002ec0] <+40>: mov w8, #0x3 ; =3
a.out[0x100002ec4] <+44>: stur w8, [x29, #-0x8]
a.out[0x100002ec8] <+48>: ldur w8, [x29, #-0x4]
a.out[0x100002ecc] <+52>: ldur w9, [x29, #-0x8]
a.out[0x100002ed0] <+56>: mul w8, w8, w9
a.out[0x100002ed4] <+60>: stur w8, [x29, #-0x20]
a.out[0x100002ed8] <+64>: sub x2, x29, #0x2c
a.out[0x100002edc] <+68>: mov w8, #0x1 ; =1
a.out[0x100002ee0] <+72>: stur w8, [x29, #-0x2c]
a.out[0x100002ee4] <+76>: sub x0, x29, #0x28
a.out[0x100002ee8] <+80>: adrp x1, 0
a.out[0x100002eec] <+84>: add x1, x1, #0xe88 ; f at main.cpp:12
a.out[0x100002ef0] <+88>: str x1, [sp, #0x8]
a.out[0x100002ef4] <+92>: bl 0x100002f60 ; std::__1::thread::thread<void (&)(int), int, void> at thread:309
a.out[0x100002ef8] <+96>: ldr w8, [sp, #0x4]
a.out[0x100002efc] <+100>: ldr x1, [sp, #0x8]
a.out[0x100002f00] <+104>: add x2, sp, #0x24
a.out[0x100002f04] <+108>: str w8, [sp, #0x24]
a.out[0x100002f08] <+112>: add x0, sp, #0x28
a.out[0x100002f0c] <+116>: bl 0x100002f60 ; std::__1::thread::thread<void (&)(int), int, void> at thread:309
a.out[0x100002f10] <+120>: b 0x100002f14 ; <+124> at main.cpp
a.out[0x100002f14] <+124>: adrp x8, 6
a.out[0x100002f18] <+128>: ldr w8, [x8, #0x4]
a.out[0x100002f1c] <+132>: stur w8, [x29, #-0xc]
a.out[0x100002f20] <+136>: add x0, sp, #0x28
a.out[0x100002f24] <+140>: bl 0x100003dd4 ; symbol stub for: std::__1::thread::~thread()
a.out[0x100002f28] <+144>: sub x0, x29, #0x28
a.out[0x100002f2c] <+148>: bl 0x100003dd4 ; symbol stub for: std::__1::thread::~thread()
a.out[0x100002f30] <+152>: ldur w0, [x29, #-0xc]
a.out[0x100002f34] <+156>: ldp x29, x30, [sp, #0x60]
a.out[0x100002f38] <+160>: add sp, sp, #0x70
a.out[0x100002f3c] <+164>: ret
a.out[0x100002f40] <+168>: mov x8, x1
a.out[0x100002f44] <+172>: str x0, [sp, #0x18]
a.out[0x100002f48] <+176>: str w8, [sp, #0x14]
a.out[0x100002f4c] <+180>: sub x0, x29, #0x28
a.out[0x100002f50] <+184>: bl 0x100003dd4 ; symbol stub for: std::__1::thread::~thread()
a.out[0x100002f54] <+188>: b 0x100002f58 ; <+192> at main.cpp:23:1
a.out[0x100002f58] <+192>: ldr x0, [sp, #0x18]
a.out[0x100002f5c] <+196>: bl 0x100003d8c ; symbol stub for: _Unwind_Resume
> BTW, this only affects backends that use the setting. x64 does not - that's the output from lldb 15.03, without any other changes.
Looks like other disassemblers already show full addresses for the branches and calls (at least arm64 does from my output above), so not sure why RISCV would require this setting, but x86_64 and arm64 wouldn't because it is already working that way??
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D155107/new/
https://reviews.llvm.org/D155107
More information about the lldb-commits
mailing list