[LLVMdev] Looking for ideas on how to make llvm-objdump handle both arm and thumb disassembly from the same object file

Kevin Enderby enderby at apple.com
Wed Aug 6 13:18:08 PDT 2014


Hi Renato,

Thanks for your reply.  A few comments in line below.

Kev

On Aug 6, 2014, at 12:50 PM, Renato Golin <renato.golin at linaro.org> wrote:

> On 6 August 2014 19:31, Kevin Enderby <enderby at apple.com> wrote:
>> First a little back ground, the way darwin’s otool(1) does this is that it creates an llvm disassembler for both arm and thumb when disassembling a binary with 32-bit ARM cpu.  It uses the C API in <llvm-c/Disassembler.h> and calls LLVMCreateDisasmCPU() twice, once with an arm TripleName and once with a matching thumb TripleName.  Then for each 32-bit ARM cpu it will default to one or the other disassembler.  Then as it disassembles and finds a symbol in the symbol table for the current PC being disassembled it will see of the symbol has the N_ARM_THUMB_DEF bit set or not.  And then switch disassemblers between the arm and thumb disassemblers.  While this is a bit of a hack there are a limited set of Mach-O cpus otool(1) deals with.
> 
> Hi Kevin,
> 
> I guess it depends on how many other targets need to deal with the
> same problem, and how much their maintainers want to cope with the
> change on their side.

That’s the rub.  I think only 32-bit arm has this issue with multiple disassemblers
and I would hate to add a bunch of stuff that all targets would have to deal with.
Love to hear if any other target maintainer could even uses this.

> 
> Creating multiple disassemblers is wasteful, but not critical to tools
> like objdump, that are rarely on the hot path. It would be
> simpler/quicker to instantiate them on objdump and then, based on the
> Thumb bit, it chooses one or the other.

I agree with all that.  And this would be pretty simple to deal with inside
objdump and be done with it.

> However, I think this would
> not be the best solution for some reasons:
> 
> 1. This is disassembler logic, and having objdump doing this on a
> higher level means that other tools that (eventually) need the same
> functionality will have to re-implement.

Also agreed.  But can you or anyone else think of other tools that would
need this logic?  Hate to do a whole bunch of work adding to the lower
layers just to make objdump a bit cleaner.

> 2. It shouldn't be that hard to join the ARM and Thumb disassembler,
> given that they're on the same file, share most of the static
> functions and could easily delegate with getInstruction() deciding
> which to use: getThumbInstruction() or getARMInstruction() and
> renaming Thumb/ARMDisassembler functions to match.

Yep could do that.  But it seems like a lot of work for very little pay off.

> Though, I haven't got my hands on the disassembler that much, so other
> people with more experience in that area could chime in and give more
> reasons on either side.

Me too.

> My personal opinion is that it'd be more elegant and stable that way,

Absolutely agree about this being more elegant.

> and any work we have to do now would compensate in the future, but
> other back-end maintainers could disagree.

As a maintainer of darwin’s otool(1) for some 20 plus years and plugging in
some 9 or more different disassemblers the use if two llvm disassembler’s for
32-bit arm is no big deal at all.  Heck it still has the old arm disassemblers in
it and many other old ones and it is very stable and I rarely if ever have to touch
those interfaces.

> 
> cheers,
> --renato





More information about the llvm-dev mailing list