[LLVMdev] Looking for ideas on how to make llvm-objdump handle both arm and thumb disassembly from the same object file

Kevin Enderby enderby at apple.com
Wed Aug 6 11:31:10 PDT 2014


Hello Tim, Rafael, Renato and llvmdev,

I’m working to get llvm-objdump handle both arm and thumb disassembly from the same object file similarly to how darwin’s otool(1) works.  And I’m looking for implementing direction.  I spoke to Jim Grosbach about some ideas and he suggested I send out and email about some of the possibilities.  Since none of the ones I could think of are pretty he thought maybe you would have some thoughts or suggestions.

First a little back ground, the way darwin’s otool(1) does this is that it creates an llvm disassembler for both arm and thumb when disassembling a binary with 32-bit ARM cpu.  It uses the C API in <llvm-c/Disassembler.h> and calls LLVMCreateDisasmCPU() twice, once with an arm TripleName and once with a matching thumb TripleName.  Then for each 32-bit ARM cpu it will default to one or the other disassembler.  Then as it disassembles and finds a symbol in the symbol table for the current PC being disassembled it will see of the symbol has the N_ARM_THUMB_DEF bit set or not.  And then switch disassemblers between the arm and thumb disassemblers.  While this is a bit of a hack there are a limited set of Mach-O cpus otool(1) deals with.

For llvm-objdump, it eventually just calls TheTarget->createMCDisassembler() and gets one disassembler for TheTarget it created.

I talked to Jim a bit about sinking the logic of maintaining multiple disassemblers down into the core disassembler logic and using subtarget to select between them.   Like the ARMAsmParser and I think the ARMInstPrinter work.  But that seems very complicated for a single target that has two disassemblers.

The implementation of llvm-objdump does have a MachODump.cpp for use with the -m option that I could do the a similar hack otool(1) like hack and special case 32-bit ARM cpus.  And at least it contains the ugliness.  But this does not really help the non -m case and I suspect ELF objects may face a similar problem.

The other more radical change I was thinking of was maybe changing MachODump.cpp to use the C API.  Then at least this way we would have something in the tree that used this and could actually have test cases.  That could then use the call backs to symbolic operands etc.  But that still could be done with the C++ API using TheTarget->createMCSymbolizer() anyway.

So if any of you have suggestions for a direction for this let me know,
Kev



More information about the llvm-dev mailing list