[llvm-dev] GSoC19: Improve LLVM binary utilities
Seiya Nuta via llvm-dev
llvm-dev at lists.llvm.org
Wed Mar 27 20:50:40 PDT 2019
Hi,
Thank you for your suggestion. It won't be easy but it's
really attractive to me!
> * sections names augmented with their indices (section name are
ambiguous)
Could you explain a little further what does "ambiguous" mean here?
You mean similar section names (e.g., .text1 and .textl)?
Seiya
On 3/27/19 23:32, bd1976 llvm via llvm-dev wrote:
> Hi Seiya,
>
> If you want a project that is not trival; but, doable in a summer; will
> be be a great leaning opportunity, and will be very useful to
> developers. Then I would suggest improving the disassembly of object
> files on x86_64. I can't count the number of times this has caused
> confusion.
>
> Consider the following assembly:
>
> nop
> nop
> .globl sym1
> sym1:
> ret
>
> .section .text2,"ax", at progbits
> jmp .text
> jmp .text+1
> jmp .text+6
> jmp sym1
> .globl sym2
> sym2:
> jmp .text2
> jmp .text2+1
> jmp .text2+20
> jmp sym2
> jmp sym2 at plt
>
> When assembled and then disassembled you will see output something like:
>
> Disassembly of section .text:
> 0x00000000: 90 nop
> 0x00000001: 90 nop
>
> sym1:
> 0x00000002: C3 ret
>
> Disassembly of section .text2:
> 0x00000000: E9 00 00 00 00 jmp .text+0xFFFFFFFFFFFFFFFC (0000000000000005h)
> 0x00000005: E9 00 00 00 00 jmp .text+0xFFFFFFFFFFFFFFFD (000000000000000Ah)
> 0x0000000A: E9 00 00 00 00 jmp sym1 (000000000000000Fh)
> 0x0000000F: E9 00 00 00 00 jmp sym2 (0000000000000014h)
>
> sym2:
> 0x00000014: EB EA jmp 0000000000000000h
> 0x00000016: EB E9 jmp 0000000000000001h
> 0x00000018: EB FA jmp sym2 (0000000000000014h)
> 0x0000001A: EB F8 jmp sym2 (0000000000000014h)
> 0x0000001C: E9 00 00 00 00 jmp sym2 (0000000000000021h)
>
> This is pretty confusing. What is wanted is output more like this:
>
> Disassembly of section .text[0]:
> 0x00000000: 90 nop
> 0x00000001: 90 nop
>
> sym1:
> 0x00000002: C3 ret
>
> Disassembly of section .text2[1]:
> 0x00000000: E9 ?? ?? ?? ?? jmp .text[0] + 0x0
> 0x00000005: E9 ?? ?? ?? ?? jmp .text[0] + 0x1
> 0x0000000A: E9 ?? ?? ?? ?? jmp .text[0] + 0x6 (sym1 + 0x4)
> 0x0000000F: E9 ?? ?? ?? ?? jmp sym1 + 0x0
>
> sym2:
> 0x00000014: EB EA jmp .text2[0] + 0x0
> 0x00000016: EB E9 jmp .text2[0] + 0x1
> 0x00000018: EB FA jmp .text2[0] + 0x14 (sym2 + 0x0)
> 0x0000001A: EB F8 jmp .text2[0] + 0x14 (sym2 + 0x0)
> 0x0000001C: E9 ?? ?? ?? ?? jmp sym2 (via GOT)
>
>
> Please forgive me for using the output of our internal tools to
> illustrate the point (I prepared this internally and don't have much
> time to write this email so I just copied and pasted). If you try this
> with LLVM's binary tools or GNU's you will see similar results.
>
> Concrete suggestions for improvements:
>
> * section relative targets augmented with symbol information
> * ?? to indicate Relocation patches
> * targets of PC relative jumps computed correctly
> * sections names augmented with their indices (section name are ambiguous)
> * branches via PLT indicated with added comments
>
> This is not trivial to accomplish. Specifically, computing the target of
> branches will either require more integration between the binary tools
> and the dissembler; or, possibly the binary tools could create a fake
> layout and then patch up the instructions so that they disassemble
> "correctly".
>
> If you manage to get that done; then I would suggest going further and
> trying to enhance the disassembly by adding color coding/outlining/ASCII
> art to the output to show things like loops, if statements, basic
> blocks. As inspiration see "rich disassembly" in this presentation by
> apple: http://devimages.apple.com/llvm/videos/LLVMMCinPractice.m4v.
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
More information about the llvm-dev
mailing list