[llvm-dev] GSoC19: Improve LLVM binary utilities
bd1976 llvm via llvm-dev
llvm-dev at lists.llvm.org
Wed Mar 27 07:32:30 PDT 2019
Hi Seiya,
If you want a project that is not trival; but, doable in a summer; will be
be a great leaning opportunity, and will be very useful to developers. Then
I would suggest improving the disassembly of object files on x86_64. I
can't count the number of times this has caused confusion.
Consider the following assembly:
nop
nop
.globl sym1
sym1:
ret
.section .text2,"ax", at progbits
jmp .text
jmp .text+1
jmp .text+6
jmp sym1
.globl sym2
sym2:
jmp .text2
jmp .text2+1
jmp .text2+20
jmp sym2
jmp sym2 at plt
When assembled and then disassembled you will see output something like:
Disassembly of section .text:
0x00000000: 90 nop
0x00000001: 90 nop
sym1:
0x00000002: C3 ret
Disassembly of section .text2:
0x00000000: E9 00 00 00 00 jmp .text+0xFFFFFFFFFFFFFFFC
(0000000000000005h)
0x00000005: E9 00 00 00 00 jmp .text+0xFFFFFFFFFFFFFFFD
(000000000000000Ah)
0x0000000A: E9 00 00 00 00 jmp sym1 (000000000000000Fh)
0x0000000F: E9 00 00 00 00 jmp sym2 (0000000000000014h)
sym2:
0x00000014: EB EA jmp 0000000000000000h
0x00000016: EB E9 jmp 0000000000000001h
0x00000018: EB FA jmp sym2 (0000000000000014h)
0x0000001A: EB F8 jmp sym2 (0000000000000014h)
0x0000001C: E9 00 00 00 00 jmp sym2 (0000000000000021h)
This is pretty confusing. What is wanted is output more like this:
Disassembly of section .text[0]:
0x00000000: 90 nop
0x00000001: 90 nop
sym1:
0x00000002: C3 ret
Disassembly of section .text2[1]:
0x00000000: E9 ?? ?? ?? ?? jmp .text[0] + 0x0
0x00000005: E9 ?? ?? ?? ?? jmp .text[0] + 0x1
0x0000000A: E9 ?? ?? ?? ?? jmp .text[0] + 0x6 (sym1 + 0x4)
0x0000000F: E9 ?? ?? ?? ?? jmp sym1 + 0x0
sym2:
0x00000014: EB EA jmp .text2[0] + 0x0
0x00000016: EB E9 jmp .text2[0] + 0x1
0x00000018: EB FA jmp .text2[0] + 0x14 (sym2 + 0x0)
0x0000001A: EB F8 jmp .text2[0] + 0x14 (sym2 + 0x0)
0x0000001C: E9 ?? ?? ?? ?? jmp sym2 (via GOT)
Please forgive me for using the output of our internal tools to illustrate
the point (I prepared this internally and don't have much time to write
this email so I just copied and pasted). If you try this with LLVM's binary
tools or GNU's you will see similar results.
Concrete suggestions for improvements:
- section relative targets augmented with symbol information
- ?? to indicate Relocation patches
- targets of PC relative jumps computed correctly
- sections names augmented with their indices (section name are
ambiguous)
- branches via PLT indicated with added comments
This is not trivial to accomplish. Specifically, computing the target of
branches will either require more integration between the binary tools and
the dissembler; or, possibly the binary tools could create a fake layout
and then patch up the instructions so that they disassemble "correctly".
If you manage to get that done; then I would suggest going further and
trying to enhance the disassembly by adding color coding/outlining/ASCII
art to the output to show things like loops, if statements, basic blocks.
As inspiration see "rich disassembly" in this presentation by apple:
http://devimages.apple.com/llvm/videos/LLVMMCinPractice.m4v.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190327/d56abc92/attachment.html>
More information about the llvm-dev
mailing list