[llvm-dev] [EXT] Re: GSoC19: Improve LLVM binary utilities

Krzysztof Parzyszek via llvm-dev llvm-dev at lists.llvm.org
Thu Mar 28 06:53:32 PDT 2019


This augmented output should not be the default, it should only be enabled with an option.

-- 
Krzysztof Parzyszek  mailto:kparzysz at quicinc.com   LLVM compiler development

From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of bd1976 llvm via llvm-dev
Sent: Wednesday, March 27, 2019 9:33 AM
To: Jordan Rupprecht <rupprecht at google.com>
Cc: llvm-dev <llvm-dev at lists.llvm.org>
Subject: [EXT] Re: [llvm-dev] GSoC19: Improve LLVM binary utilities

Hi Seiya,

If you want a project that is not trival; but, doable in a summer; will be be a great leaning opportunity, and will be very useful to developers. Then I would suggest improving the disassembly of object files on x86_64. I can't count the number of times this has caused confusion.

Consider the following assembly:
    nop
    nop
    .globl sym1
sym1:
    ret

.section .text2,"ax", at progbits
    jmp .text
    jmp .text+1
    jmp .text+6
    jmp sym1
    .globl sym2
sym2:
    jmp .text2
    jmp .text2+1
    jmp .text2+20
    jmp sym2
    jmp sym2 at plt
When assembled and then disassembled you will see output something like:
Disassembly of section .text:
0x00000000: 90                      nop
0x00000001: 90                      nop

sym1:
0x00000002: C3                      ret

Disassembly of section .text2:
0x00000000: E9 00 00 00 00          jmp      .text+0xFFFFFFFFFFFFFFFC (0000000000000005h)
0x00000005: E9 00 00 00 00          jmp      .text+0xFFFFFFFFFFFFFFFD (000000000000000Ah)
0x0000000A: E9 00 00 00 00          jmp      sym1 (000000000000000Fh)
0x0000000F: E9 00 00 00 00          jmp      sym2 (0000000000000014h)

sym2:
0x00000014: EB EA                   jmp      0000000000000000h
0x00000016: EB E9                   jmp      0000000000000001h
0x00000018: EB FA                   jmp      sym2 (0000000000000014h)
0x0000001A: EB F8                   jmp      sym2 (0000000000000014h)
0x0000001C: E9 00 00 00 00          jmp      sym2 (0000000000000021h)
This is pretty confusing. What is wanted is output more like this:
Disassembly of section .text[0]:
0x00000000: 90                      nop
0x00000001: 90                      nop

sym1:
0x00000002: C3                      ret

Disassembly of section .text2[1]:
0x00000000: E9 ?? ?? ?? ??          jmp      .text[0] + 0x0
0x00000005: E9 ?? ?? ?? ??          jmp      .text[0] + 0x1
0x0000000A: E9 ?? ?? ?? ??          jmp      .text[0] + 0x6 (sym1 + 0x4)
0x0000000F: E9 ?? ?? ?? ??          jmp      sym1 + 0x0

sym2:
0x00000014: EB EA                   jmp      .text2[0] + 0x0
0x00000016: EB E9                   jmp      .text2[0] + 0x1
0x00000018: EB FA                   jmp      .text2[0] + 0x14 (sym2 + 0x0)
0x0000001A: EB F8                   jmp      .text2[0] + 0x14 (sym2 + 0x0)
0x0000001C: E9 ?? ?? ?? ??          jmp      sym2 (via GOT)

Please forgive me for using the output of our internal tools to illustrate the point (I prepared this internally and don't have much time to write this email so I just copied and pasted). If you try this with LLVM's binary tools or GNU's you will see similar results.

Concrete suggestions for improvements:
• section relative targets augmented with symbol information
• ?? to indicate Relocation patches
• targets of PC relative jumps computed correctly
• sections names augmented with their indices (section name are ambiguous)
• branches via PLT indicated with added comments
This is not trivial to accomplish. Specifically, computing the target of branches will either require more integration between the binary tools and the dissembler; or, possibly the binary tools could create a fake layout and then patch up the instructions so that they disassemble "correctly".
If you manage to get that done; then I would suggest going further and trying to enhance the disassembly by adding color coding/outlining/ASCII art to the output to show things like loops, if statements, basic blocks. As inspiration see "rich disassembly" in this presentation by apple: http://devimages.apple.com/llvm/videos/LLVMMCinPractice.m4v.



More information about the llvm-dev mailing list