[llvm-dev] GSoC19: Improve LLVM binary utilities

Seiya Nuta via llvm-dev llvm-dev at lists.llvm.org
Wed Mar 27 20:50:40 PDT 2019


Hi,

Thank you for your suggestion. It won't be easy but it's
really attractive to me!

 >   * sections names augmented with their indices (section name are 
ambiguous)
Could you explain a little further what does "ambiguous" mean here?

You mean similar section names (e.g., .text1 and .textl)?

Seiya

On 3/27/19 23:32, bd1976 llvm via llvm-dev wrote:
> Hi Seiya,
> 
> If you want a project that is not trival; but, doable in a summer; will 
> be be a great leaning opportunity, and will be very useful to 
> developers. Then I would suggest improving the disassembly of object 
> files on x86_64. I can't count the number of times this has caused 
> confusion.
> 
> Consider the following assembly:
> 
>      nop
>      nop
>      .globl sym1
> sym1:
>      ret
> 
> .section .text2,"ax", at progbits
>      jmp .text
>      jmp .text+1
>      jmp .text+6
>      jmp sym1
>      .globl sym2
> sym2:
>      jmp .text2
>      jmp .text2+1
>      jmp .text2+20
>      jmp sym2
>      jmp sym2 at plt
> 
> When assembled and then disassembled you will see output something like:
> 
> Disassembly of section .text:
> 0x00000000: 90                      nop
> 0x00000001: 90                      nop
> 
> sym1:
> 0x00000002: C3                      ret
> 
> Disassembly of section .text2:
> 0x00000000: E9 00 00 00 00          jmp      .text+0xFFFFFFFFFFFFFFFC (0000000000000005h)
> 0x00000005: E9 00 00 00 00          jmp      .text+0xFFFFFFFFFFFFFFFD (000000000000000Ah)
> 0x0000000A: E9 00 00 00 00          jmp      sym1 (000000000000000Fh)
> 0x0000000F: E9 00 00 00 00          jmp      sym2 (0000000000000014h)
> 
> sym2:
> 0x00000014: EB EA                   jmp      0000000000000000h
> 0x00000016: EB E9                   jmp      0000000000000001h
> 0x00000018: EB FA                   jmp      sym2 (0000000000000014h)
> 0x0000001A: EB F8                   jmp      sym2 (0000000000000014h)
> 0x0000001C: E9 00 00 00 00          jmp      sym2 (0000000000000021h)
> 
> This is pretty confusing. What is wanted is output more like this:
> 
> Disassembly of section .text[0]:
> 0x00000000: 90                      nop
> 0x00000001: 90                      nop
> 
> sym1:
> 0x00000002: C3                      ret
> 
> Disassembly of section .text2[1]:
> 0x00000000: E9 ?? ?? ?? ??          jmp      .text[0] + 0x0
> 0x00000005: E9 ?? ?? ?? ??          jmp      .text[0] + 0x1
> 0x0000000A: E9 ?? ?? ?? ??          jmp      .text[0] + 0x6 (sym1 + 0x4)
> 0x0000000F: E9 ?? ?? ?? ??          jmp      sym1 + 0x0
> 
> sym2:
> 0x00000014: EB EA                   jmp      .text2[0] + 0x0
> 0x00000016: EB E9                   jmp      .text2[0] + 0x1
> 0x00000018: EB FA                   jmp      .text2[0] + 0x14 (sym2 + 0x0)
> 0x0000001A: EB F8                   jmp      .text2[0] + 0x14 (sym2 + 0x0)
> 0x0000001C: E9 ?? ?? ?? ??          jmp      sym2 (via GOT)
> 
> 
> Please forgive me for using the output of our internal tools to 
> illustrate the point (I prepared this internally and don't have much 
> time to write this email so I just copied and pasted). If you try this 
> with LLVM's binary tools or GNU's you will see similar results.
> 
> Concrete suggestions for improvements:
> 
>   * section relative targets augmented with symbol information
>   * ?? to indicate Relocation patches
>   * targets of PC relative jumps computed correctly
>   * sections names augmented with their indices (section name are ambiguous)
>   * branches via PLT indicated with added comments
> 
> This is not trivial to accomplish. Specifically, computing the target of 
> branches will either require more integration between the binary tools 
> and the dissembler; or, possibly the binary tools could create a fake 
> layout and then patch up the instructions so that they disassemble 
> "correctly".
> 
> If you manage to get that done; then I would suggest going further and 
> trying to enhance the disassembly by adding color coding/outlining/ASCII 
> art to the output to show things like loops, if statements, basic 
> blocks. As inspiration see "rich disassembly" in this presentation by 
> apple: http://devimages.apple.com/llvm/videos/LLVMMCinPractice.m4v.
> 
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 


More information about the llvm-dev mailing list