[llvm-dev] GSoC19: Improve LLVM binary utilities

bd1976 llvm via llvm-dev llvm-dev at lists.llvm.org
Wed Mar 27 07:32:30 PDT 2019


Hi Seiya,

If you want a project that is not trival; but, doable in a summer; will be
be a great leaning opportunity, and will be very useful to developers. Then
I would suggest improving the disassembly of object files on x86_64. I
can't count the number of times this has caused confusion.

Consider the following assembly:

    nop
    nop
    .globl sym1
sym1:
    ret

.section .text2,"ax", at progbits
    jmp .text
    jmp .text+1
    jmp .text+6
    jmp sym1
    .globl sym2
sym2:
    jmp .text2
    jmp .text2+1
    jmp .text2+20
    jmp sym2
    jmp sym2 at plt

When assembled and then disassembled you will see output something like:

Disassembly of section .text:
0x00000000: 90                      nop
0x00000001: 90                      nop

sym1:
0x00000002: C3                      ret

Disassembly of section .text2:
0x00000000: E9 00 00 00 00          jmp      .text+0xFFFFFFFFFFFFFFFC
(0000000000000005h)
0x00000005: E9 00 00 00 00          jmp      .text+0xFFFFFFFFFFFFFFFD
(000000000000000Ah)
0x0000000A: E9 00 00 00 00          jmp      sym1 (000000000000000Fh)
0x0000000F: E9 00 00 00 00          jmp      sym2 (0000000000000014h)

sym2:
0x00000014: EB EA                   jmp      0000000000000000h
0x00000016: EB E9                   jmp      0000000000000001h
0x00000018: EB FA                   jmp      sym2 (0000000000000014h)
0x0000001A: EB F8                   jmp      sym2 (0000000000000014h)
0x0000001C: E9 00 00 00 00          jmp      sym2 (0000000000000021h)

This is pretty confusing. What is wanted is output more like this:

Disassembly of section .text[0]:
0x00000000: 90                      nop
0x00000001: 90                      nop

sym1:
0x00000002: C3                      ret

Disassembly of section .text2[1]:
0x00000000: E9 ?? ?? ?? ??          jmp      .text[0] + 0x0
0x00000005: E9 ?? ?? ?? ??          jmp      .text[0] + 0x1
0x0000000A: E9 ?? ?? ?? ??          jmp      .text[0] + 0x6 (sym1 + 0x4)
0x0000000F: E9 ?? ?? ?? ??          jmp      sym1 + 0x0

sym2:
0x00000014: EB EA                   jmp      .text2[0] + 0x0
0x00000016: EB E9                   jmp      .text2[0] + 0x1
0x00000018: EB FA                   jmp      .text2[0] + 0x14 (sym2 + 0x0)
0x0000001A: EB F8                   jmp      .text2[0] + 0x14 (sym2 + 0x0)
0x0000001C: E9 ?? ?? ?? ??          jmp      sym2 (via GOT)


Please forgive me for using the output of our internal tools to illustrate
the point (I prepared this internally and don't have much time to write
this email so I just copied and pasted). If you try this with LLVM's binary
tools or GNU's you will see similar results.

Concrete suggestions for improvements:

   - section relative targets augmented with symbol information
   - ?? to indicate Relocation patches
   - targets of PC relative jumps computed correctly
   - sections names augmented with their indices (section name are
   ambiguous)
   - branches via PLT indicated with added comments

This is not trivial to accomplish. Specifically, computing the target of
branches will either require more integration between the binary tools and
the dissembler; or, possibly the binary tools could create a fake layout
and then patch up the instructions so that they disassemble "correctly".

If you manage to get that done; then I would suggest going further and
trying to enhance the disassembly by adding color coding/outlining/ASCII
art to the output to show things like loops, if statements, basic blocks.
As inspiration see "rich disassembly" in this presentation by apple:
http://devimages.apple.com/llvm/videos/LLVMMCinPractice.m4v.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190327/d56abc92/attachment.html>


More information about the llvm-dev mailing list