[llvm-dev] [EXT] Re: GSoC19: Improve LLVM binary utilities
Krzysztof Parzyszek via llvm-dev
llvm-dev at lists.llvm.org
Thu Mar 28 06:53:32 PDT 2019
This augmented output should not be the default, it should only be enabled with an option.
--
Krzysztof Parzyszek mailto:kparzysz at quicinc.com LLVM compiler development
From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of bd1976 llvm via llvm-dev
Sent: Wednesday, March 27, 2019 9:33 AM
To: Jordan Rupprecht <rupprecht at google.com>
Cc: llvm-dev <llvm-dev at lists.llvm.org>
Subject: [EXT] Re: [llvm-dev] GSoC19: Improve LLVM binary utilities
Hi Seiya,
If you want a project that is not trival; but, doable in a summer; will be be a great leaning opportunity, and will be very useful to developers. Then I would suggest improving the disassembly of object files on x86_64. I can't count the number of times this has caused confusion.
Consider the following assembly:
nop
nop
.globl sym1
sym1:
ret
.section .text2,"ax", at progbits
jmp .text
jmp .text+1
jmp .text+6
jmp sym1
.globl sym2
sym2:
jmp .text2
jmp .text2+1
jmp .text2+20
jmp sym2
jmp sym2 at plt
When assembled and then disassembled you will see output something like:
Disassembly of section .text:
0x00000000: 90 nop
0x00000001: 90 nop
sym1:
0x00000002: C3 ret
Disassembly of section .text2:
0x00000000: E9 00 00 00 00 jmp .text+0xFFFFFFFFFFFFFFFC (0000000000000005h)
0x00000005: E9 00 00 00 00 jmp .text+0xFFFFFFFFFFFFFFFD (000000000000000Ah)
0x0000000A: E9 00 00 00 00 jmp sym1 (000000000000000Fh)
0x0000000F: E9 00 00 00 00 jmp sym2 (0000000000000014h)
sym2:
0x00000014: EB EA jmp 0000000000000000h
0x00000016: EB E9 jmp 0000000000000001h
0x00000018: EB FA jmp sym2 (0000000000000014h)
0x0000001A: EB F8 jmp sym2 (0000000000000014h)
0x0000001C: E9 00 00 00 00 jmp sym2 (0000000000000021h)
This is pretty confusing. What is wanted is output more like this:
Disassembly of section .text[0]:
0x00000000: 90 nop
0x00000001: 90 nop
sym1:
0x00000002: C3 ret
Disassembly of section .text2[1]:
0x00000000: E9 ?? ?? ?? ?? jmp .text[0] + 0x0
0x00000005: E9 ?? ?? ?? ?? jmp .text[0] + 0x1
0x0000000A: E9 ?? ?? ?? ?? jmp .text[0] + 0x6 (sym1 + 0x4)
0x0000000F: E9 ?? ?? ?? ?? jmp sym1 + 0x0
sym2:
0x00000014: EB EA jmp .text2[0] + 0x0
0x00000016: EB E9 jmp .text2[0] + 0x1
0x00000018: EB FA jmp .text2[0] + 0x14 (sym2 + 0x0)
0x0000001A: EB F8 jmp .text2[0] + 0x14 (sym2 + 0x0)
0x0000001C: E9 ?? ?? ?? ?? jmp sym2 (via GOT)
Please forgive me for using the output of our internal tools to illustrate the point (I prepared this internally and don't have much time to write this email so I just copied and pasted). If you try this with LLVM's binary tools or GNU's you will see similar results.
Concrete suggestions for improvements:
• section relative targets augmented with symbol information
• ?? to indicate Relocation patches
• targets of PC relative jumps computed correctly
• sections names augmented with their indices (section name are ambiguous)
• branches via PLT indicated with added comments
This is not trivial to accomplish. Specifically, computing the target of branches will either require more integration between the binary tools and the dissembler; or, possibly the binary tools could create a fake layout and then patch up the instructions so that they disassemble "correctly".
If you manage to get that done; then I would suggest going further and trying to enhance the disassembly by adding color coding/outlining/ASCII art to the output to show things like loops, if statements, basic blocks. As inspiration see "rich disassembly" in this presentation by apple: http://devimages.apple.com/llvm/videos/LLVMMCinPractice.m4v.
More information about the llvm-dev
mailing list