[llvm-dev] GSoC19: Improve LLVM binary utilities
Jake Ehrlich via llvm-dev
llvm-dev at lists.llvm.org
Wed Mar 27 10:54:12 PDT 2019
This is what I meant by llvm-objdump improvements.
On Wed, Mar 27, 2019 at 7:32 AM bd1976 llvm <bd1976llvm at gmail.com> wrote:
> Hi Seiya,
>
> If you want a project that is not trival; but, doable in a summer; will be
> be a great leaning opportunity, and will be very useful to developers. Then
> I would suggest improving the disassembly of object files on x86_64. I
> can't count the number of times this has caused confusion.
>
> Consider the following assembly:
>
> nop
> nop
> .globl sym1
> sym1:
> ret
>
> .section .text2,"ax", at progbits
> jmp .text
> jmp .text+1
> jmp .text+6
> jmp sym1
> .globl sym2
> sym2:
> jmp .text2
> jmp .text2+1
> jmp .text2+20
> jmp sym2
> jmp sym2 at plt
>
> When assembled and then disassembled you will see output something like:
>
> Disassembly of section .text:
> 0x00000000: 90 nop
> 0x00000001: 90 nop
>
> sym1:
> 0x00000002: C3 ret
>
> Disassembly of section .text2:
> 0x00000000: E9 00 00 00 00 jmp .text+0xFFFFFFFFFFFFFFFC (0000000000000005h)
> 0x00000005: E9 00 00 00 00 jmp .text+0xFFFFFFFFFFFFFFFD (000000000000000Ah)
> 0x0000000A: E9 00 00 00 00 jmp sym1 (000000000000000Fh)
> 0x0000000F: E9 00 00 00 00 jmp sym2 (0000000000000014h)
>
> sym2:
> 0x00000014: EB EA jmp 0000000000000000h
> 0x00000016: EB E9 jmp 0000000000000001h
> 0x00000018: EB FA jmp sym2 (0000000000000014h)
> 0x0000001A: EB F8 jmp sym2 (0000000000000014h)
> 0x0000001C: E9 00 00 00 00 jmp sym2 (0000000000000021h)
>
> This is pretty confusing. What is wanted is output more like this:
>
> Disassembly of section .text[0]:
> 0x00000000: 90 nop
> 0x00000001: 90 nop
>
> sym1:
> 0x00000002: C3 ret
>
> Disassembly of section .text2[1]:
> 0x00000000: E9 ?? ?? ?? ?? jmp .text[0] + 0x0
> 0x00000005: E9 ?? ?? ?? ?? jmp .text[0] + 0x1
> 0x0000000A: E9 ?? ?? ?? ?? jmp .text[0] + 0x6 (sym1 + 0x4)
> 0x0000000F: E9 ?? ?? ?? ?? jmp sym1 + 0x0
>
> sym2:
> 0x00000014: EB EA jmp .text2[0] + 0x0
> 0x00000016: EB E9 jmp .text2[0] + 0x1
> 0x00000018: EB FA jmp .text2[0] + 0x14 (sym2 + 0x0)
> 0x0000001A: EB F8 jmp .text2[0] + 0x14 (sym2 + 0x0)
> 0x0000001C: E9 ?? ?? ?? ?? jmp sym2 (via GOT)
>
>
> Please forgive me for using the output of our internal tools to illustrate
> the point (I prepared this internally and don't have much time to write
> this email so I just copied and pasted). If you try this with LLVM's binary
> tools or GNU's you will see similar results.
>
> Concrete suggestions for improvements:
>
> - section relative targets augmented with symbol information
> - ?? to indicate Relocation patches
> - targets of PC relative jumps computed correctly
> - sections names augmented with their indices (section name are
> ambiguous)
> - branches via PLT indicated with added comments
>
> This is not trivial to accomplish. Specifically, computing the target of
> branches will either require more integration between the binary tools and
> the dissembler; or, possibly the binary tools could create a fake layout
> and then patch up the instructions so that they disassemble "correctly".
>
> If you manage to get that done; then I would suggest going further and
> trying to enhance the disassembly by adding color coding/outlining/ASCII
> art to the output to show things like loops, if statements, basic blocks.
> As inspiration see "rich disassembly" in this presentation by apple:
> http://devimages.apple.com/llvm/videos/LLVMMCinPractice.m4v.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190327/d3987054/attachment-0001.html>
More information about the llvm-dev
mailing list