[LLVMdev] Making llvm-objdump more like GNU objdump

Kevin Enderby enderby at apple.com
Thu Dec 4 16:39:50 PST 2014


Hi Steve,

For this issue it appears llvm’s disassembler uses a negative hex value:

% cat x.s
.byte 0x83, 0xc0, 0x9c
% clang -c x.s
% otool -tvj x.o 
x.o:
(__TEXT,__text) section
0000000000000000	83c09c          	addl	$-0x64, %eax

and that assembles to the same:

% cat y.s
addl	$-0x64, %eax
% clang -c y.s
% otool -tvj y.o 
y.o:
(__TEXT,__text) section
0000000000000000	83c09c          	addl	$-0x64, %eax

The old built in disassembler in otool(1) does this:

% otool -Qtv y.o 
y.o:
(__TEXT,__text) section
0000000000000000	addl	$0x9c,%eax

So while I’m not a fan of a negative hex value it seams to make sense for the llvm disassembler add allowing the disassembly and assembly to follow the golden rule.

Kev

> On Dec 4, 2014, at 3:54 PM, Steve King <steve at metrokings.com> wrote:
> 
> Another wrinkle is that hex values are parsed as unsigned.  For
> example, take this instruction on x86:
> 
> 83 c0 9c     addl $-100, %eax
> 
> Keeping to imm8, -100 is a 0x9C in hex.  Suppose llvm-objdump
> disassembled the instruction this way:
> 
> addl $0x9C,%eax
> 
> Re-assembling results in a different and wrong instruction.
> 
> 05 9c 00 00 00   addl $0x9C, %eax
> 
> I assume we have a golden rule that reassembling our disassembly
> should get back to the same binary.
> 
> For reference, GNU objdump prints hex but knows the width of operand
> and extends appropriately:
> 
> 83 c0 9c              add    $0xffffff9c,%eax
> 05 9c 00 00 00     add    $0x9c,%eax
> 
> Unfortunately, the logical operand width doesn't seem to be handy in
> the target InstPrinter code.  Any ideas how best find the logical
> operand width?
> 
> Regards,
> -steve
> 
> 
> 
> 
> 
> 
> On Wed, Dec 3, 2014 at 5:09 PM, Kevin Enderby <enderby at apple.com> wrote:
>> 
>> On Dec 3, 2014, at 3:12 PM, Steve King <steve at metrokings.com> wrote:
>> 
>>> OK.  Let's try a specific example:  At least for ELF files, GNU
>>> objdump prints operand values in hex.  AFAIK, hex is not just the
>>> default, but the only choice.  On the other hand, llvm-objdump prints
>>> operand values in decimal and ignores the --print-imm-hex option for
>>> ELF.
>>> 
>>> How about a patch to print operands in hex for ELF?  Good place to start?
>> 
>> Seem like a good place to start if you want to create a patch that honors the --print-imm-hex option for ELF files.
>> 
>> At one point I had to I hooked up the existing -no-show-raw-insn option to the Mach-O parser code in llvm-objdump to allow me to test its output against darwin’s otool(1).  And later even had to add the -no-show-raw-insn option to darwin’s otool(1) so that arm64 code could also be diff’ed.
>> 
>> In talking to Jim Grosbach today, the idea is to first get all the functionality implemented.  Then later worry about getting the packaging stuff like the defaults for all the options to match the native tool we are trying to replace.
>> 
>>> 
>>> On Mon, Dec 1, 2014 at 5:49 PM, Kevin Enderby <enderby at apple.com> wrote:
>>>> There currently is a -macho option to llvm-objdump to "Use MachO specific object file parser” which I’m hiding the disassembly stuff specific for Mach-O behind.  Currently it is only used with the -disassemble option.  But one could see it to be used for other stuff.  But as Jim points out the output today for some things is controlled by the container which is what is done for things like -private-headers .  There are flags like -exports-trie, -rebase, -bind, etc that are really Mach-O options.
>>>> 
>>>> As far as the symbolizing work it can be relevant for ELF files and the code I did can be used as a model for hooking it up for ELF files.  But the real work of the call backs are very specific to each type of object file.
>>>> 
>>>> Kev
>>>> 
>>>> On Dec 1, 2014, at 5:24 PM, Jim Grosbach <grosbach at apple.com> wrote:
>>>> 
>>>>> At least for now, I don’t expect it to become all that unwieldy. Any behavioral differences should be easily separable into different classes and source files. If as things progress it becomes obvious that there’s really not much of anything in common other than the general nature of the tools, it’s easy to split them apart.
>>>>> 
>>>>> -Jim
>>>>> 
>>>>>> On Dec 1, 2014, at 5:20 PM, Steve King <steve at metrokings.com> wrote:
>>>>>> 
>>>>>> Hi guys, thanks for responding.  Will mimicking both otool and objdump
>>>>>> in one binary become unwieldy?  Maybe a disassembler library would be
>>>>>> a better way to factor out common code?  For example, will Kevin's
>>>>>> symbolizing work be relevant for ELF files?
>>>>>> Regards,
>>>>>> -steve
>>>>>> 
>>>>>> 
>>>>>> On Mon, Dec 1, 2014 at 4:50 PM, Jim Grosbach <grosbach at apple.com> wrote:
>>>>>>> Hey folks,
>>>>>>> 
>>>>>>> This is great to see more interest on the supporting tools like objdump and such. I very much agree that bringing llvm-objdump up to feature parity (to start with) compared to both otool(1) and objdump(1) is a great goal. The default output formatting is easy enough to get right by having it be controlled by the container format (otool style for macho, objdump style for ELF). Kevin’s right that where this gets a bit interesting is command line option handling. The prevailing wisdom from clang and lld so far seems to the alternatives Kevin mentions of sniffing argv[0] and/or having a —flavor or —format option. IMO, for now we can just do the latter, which is the simpler thing, while we get the real functionality in place. Then when we’re ready to, optionally as packagers decide to opt-in, use llvm-objdump to replace the system version, we can figure out the right way to make that transition nice and clean.
>>>>>>> 
>>>>>>> -jim
>>>>>>> 
>>>>>>> 
>>>>>>>> On Dec 1, 2014, at 4:40 PM, Kevin Enderby <enderby at apple.com> wrote:
>>>>>>>> 
>>>>>>>> Hi Steve,
>>>>>>>> 
>>>>>>>> I’ve been trying to get the functionality of llvm-objdump to match that of darwin’s otool(1).  In adding the support for symbolic disassembly and to allow testing of it on very large files that would allow the disassembly to diff cleanly, I added a few options to llvm-objdump and to tool(1).  For example these would be the two command lines I would use for testing:
>>>>>>>> 
>>>>>>>> llvm-objdump -d -m -no-show-raw-insn -full-leading-addr -print-imm-hex …
>>>>>>>> otool -tV -U -no-show-raw-insn …
>>>>>>>> 
>>>>>>>> Longest term I hope to see llvm-objdump take over all of darwin’s otool(1) functionality.  Not sure the best way of going this for command line options as the trick of passing them differently based on argv[0] may not work.  There may need to be some wrapper to do that.  And also their may need to be some option like llvm-nm’s "-format XXX” to get the output to match so scrips can use the output.
>>>>>>>> 
>>>>>>>> I’ve Cc’ed Jim Grosbach as he may have some guidance on this.
>>>>>>>> 
>>>>>>>> My thoughts,
>>>>>>>> Kev
>>>>>>>> 
>>>>>>>> On Dec 1, 2014, at 4:20 PM, Steve King <steve at metrokings.com> wrote:
>>>>>>>> 
>>>>>>>>> Hello LLVM,
>>>>>>>>> 
>>>>>>>>> Previously, some folks wanted llvm-objdump to behave more like GNU
>>>>>>>>> objdump.  This could encompass both command line options and output
>>>>>>>>> format.  Such a change helps developers already familiar with GNU
>>>>>>>>> tools and allows re-use of Perl scripts or other automation expecting
>>>>>>>>> to see GNU style dumps.
>>>>>>>>> 
>>>>>>>>> Is moving llvm-objdump toward GNU objdump the general preference?  And
>>>>>>>>> what about otools style output?
>>>>>>>>> 
>>>>>>>>> Regards,
>>>>>>>>> -steve
>>>>>>>>> _______________________________________________
>>>>>>>>> LLVM Developers mailing list
>>>>>>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>>>> 
>> 
>> 





More information about the llvm-dev mailing list