[LLVMdev] MC disassembler for ARM

Jim Grosbach grosbach at apple.com
Fri Jun 8 09:18:29 PDT 2012


That depends on how you define "one ARM instruction." It's not a clear cut thing. For example, is "add r1, r2, r3" the same ARM instruction as "add r1, r2, #4"? What is a distinct instruction and what's a variant encoding of the same instruction is often entirely a matter of convenience.

-Jim


On Jun 8, 2012, at 6:40 AM, Fan Dawei <fandawei.s at gmail.com> wrote:

> Hi Jim,
> 
> Thanks for reply. I'm sorry I didn't make myself clear enough. 
> 
> The MCInst created by MCDisassembler depends on the instructions defined in td files. These instructions do not have a one to one mapping to ARM instructions. There are usually one or more instructions defined in the td file correspond to one actual ARM instruction.
> 
> Thanks,
> David
> 
> On Thu, Jun 7, 2012 at 1:27 PM, Jim Grosbach <grosbach at apple.com> wrote:
> 
> On Jun 7, 2012, at 7:53 AM, Fan Dawei <fandawei.s at gmail.com> wrote:
> 
>> Hi Tim,
>> 
>> Thanks a lot for your help! I'm very grateful.
>> 
>> libc.so is a prelinked library, I'll build a non-prelinked one and have another try.
>> 
>> I'm now at the start of a binary translation project. I want to convert ARM binary code [*] to llvm ir, which is then translated to binary for our mips like architecture. That's why I'm looking for a decoder for ARM binary. 
>> 
>> The ARMMCDisassembler is production quality as be told by Evan. That's why I'm so interested in it. However, I realized today that might not be a good choice. Although the disassembled MCInsts has a clean and simple interface, the op-codes in them are auto generated from instruction description files. They are in large quantities and do not have one-to-one correspondence to arm instructions. I think it is not a good idea for our translator to rely on the implementation of llvm ARM back-end. So I have to find another decoder or implement it by by ourselves.
> 
> Every MCInst created by the MCDisassembler will have a one-to-one mapping to an actual ARM instruction.
> 
>> 
>> Thanks,
>> David
>> 
>> [*] For most case,  the targets are the shared libraries in Android APKs developed by NDK, like libangraybird.so. I think most of them are pre-linked, so it is bad for us. Because there is no $a, $t and $d symbols, we cannot figure out which region is arm code or thumb code statically.
>> 
>> 
>> On Thu, Jun 7, 2012 at 8:11 PM, Tim Northover <t.p.northover at gmail.com> wrote:
>> Hi David,
>> 
>> On Thu, Jun 7, 2012 at 10:17 AM, Fan Dawei <fandawei.s at gmail.com> wrote:
>> > Could you please tell me more about $a, $t and $d symbols? How these symbols
>> > are used to define different regions? Where I can find this symbols in ELF
>> > object file?
>> 
>> At the start of each range of ARM code, an assembler or compiler
>> should produce a "$a" symbol with that address, and put it (naturally
>> enough) in the ELF symbol-table. Similarly each stretch of Thumb code
>> gets a "$t" and each data a "$d".
>> 
>> For example if I assemble:
>> 
>>    .arm
>>    mov r0, r3
>>    ldr r2, Lit
>> Lit:
>>    .word 42
>>    add r0, r0, r0
>>    .thumb
>>    mov r5, r2
>> 
>> then the symbol table contains these entries:
>>     4: 00000000     0 NOTYPE  LOCAL  DEFAULT    1 $a
>>     [...]
>>     6: 00000008     0 NOTYPE  LOCAL  DEFAULT    1 $d
>>     7: 0000000c     0 NOTYPE  LOCAL  DEFAULT    1 $a
>>     8: 00000010     0 NOTYPE  LOCAL  DEFAULT    1 $t
>> 
>> which shows that an ARM region begins at offset 0x0, a data one at
>> offset 0x8, we switch back to ARM at 0xc and finally Thumb takes over
>> at 0x10.
>> 
>> GNU objdump hides the symbols by default when printing the
>> symbol-table (you can give it the --special-syms option to show them),
>> but readelf shows them always.
>> 
>> If you want the really deep details, they're fully documented in the
>> ARM ELF ABI here (section 4.6.5):
>> 
>> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0044d/IHI0044D_aaelf.pdf
>> 
>> Which is all nice to know, but I'm afraid it probably doesn't offer an
>> immediate solution to the undefined instructions:
>> + libc.so isn't a relocatable object file (well, it is dynamically,
>> but that doesn't count).
>> + llvm-objdump ignores them anyway at the moment, as far as I can tell.
>> 
>> Tim.
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120608/9edcb045/attachment.html>


More information about the llvm-dev mailing list