[LLVMdev] MC disassembler for ARM

Fan Dawei fandawei.s at gmail.com
Thu Jun 7 07:53:25 PDT 2012


Hi Tim,

Thanks a lot for your help! I'm very grateful.

libc.so is a prelinked library, I'll build a non-prelinked one and have
another try.

I'm now at the start of a binary translation project. I want to convert ARM
binary code [*] to llvm ir, which is then translated to binary for our mips
like architecture. That's why I'm looking for a decoder for ARM binary.

The ARMMCDisassembler is production quality as be told by Evan. That's why
I'm so interested in it. However, I realized today that might not be a good
choice. Although the disassembled MCInsts has a clean and simple interface,
the op-codes in them are auto generated from instruction description files.
They are in large quantities and do not have one-to-one correspondence to
arm instructions. I think it is not a good idea for our translator to rely
on the implementation of llvm ARM back-end. So I have to find another
decoder or implement it by by ourselves.

Thanks,
David

[*] For most case,  the targets are the shared libraries in Android APKs
developed by NDK, like libangraybird.so. I think most of them are
pre-linked, so it is bad for us. Because there is no $a, $t and $d symbols,
we cannot figure out which region is arm code or thumb code statically.


On Thu, Jun 7, 2012 at 8:11 PM, Tim Northover <t.p.northover at gmail.com>wrote:

> Hi David,
>
> On Thu, Jun 7, 2012 at 10:17 AM, Fan Dawei <fandawei.s at gmail.com> wrote:
> > Could you please tell me more about $a, $t and $d symbols? How these
> symbols
> > are used to define different regions? Where I can find this symbols in
> ELF
> > object file?
>
> At the start of each range of ARM code, an assembler or compiler
> should produce a "$a" symbol with that address, and put it (naturally
> enough) in the ELF symbol-table. Similarly each stretch of Thumb code
> gets a "$t" and each data a "$d".
>
> For example if I assemble:
>
>    .arm
>    mov r0, r3
>    ldr r2, Lit
> Lit:
>    .word 42
>    add r0, r0, r0
>    .thumb
>    mov r5, r2
>
> then the symbol table contains these entries:
>     4: 00000000     0 NOTYPE  LOCAL  DEFAULT    1 $a
>     [...]
>     6: 00000008     0 NOTYPE  LOCAL  DEFAULT    1 $d
>     7: 0000000c     0 NOTYPE  LOCAL  DEFAULT    1 $a
>     8: 00000010     0 NOTYPE  LOCAL  DEFAULT    1 $t
>
> which shows that an ARM region begins at offset 0x0, a data one at
> offset 0x8, we switch back to ARM at 0xc and finally Thumb takes over
> at 0x10.
>
> GNU objdump hides the symbols by default when printing the
> symbol-table (you can give it the --special-syms option to show them),
> but readelf shows them always.
>
> If you want the really deep details, they're fully documented in the
> ARM ELF ABI here (section 4.6.5):
>
>
> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0044d/IHI0044D_aaelf.pdf
>
> Which is all nice to know, but I'm afraid it probably doesn't offer an
> immediate solution to the undefined instructions:
> + libc.so isn't a relocatable object file (well, it is dynamically,
> but that doesn't count).
> + llvm-objdump ignores them anyway at the moment, as far as I can tell.
>
> Tim.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120607/f3a505bc/attachment.html>


More information about the llvm-dev mailing list