Hi Tim,<div><br></div><div>Thanks a lot for your help! I'm very grateful.</div><div><br></div><div>libc.so is a prelinked library, I'll build a non-prelinked one and have another try.</div><div><br></div><div>I'm now at the start of a binary translation project. I want to convert ARM binary code [*] to llvm ir, which is then translated to binary for our mips like architecture. That's why I'm looking for a decoder for ARM binary. </div>
<div><br></div><div>The ARMMCDisassembler is production quality as be told by Evan. That's why I'm so interested in it. However, I realized today that might not be a good choice. Although the disassembled MCInsts has a clean and simple interface, the op-codes in them are auto generated from instruction description files. They are in large quantities and do not have one-to-one correspondence to arm instructions. I think it is not a good idea for our translator to rely on the implementation of llvm ARM back-end. So I have to find another decoder or implement it by by ourselves.</div>
<div><br></div><div><div>Thanks,</div><div>David</div></div><div><br></div><div>[*] For most case, the targets are the shared libraries in Android APKs developed by NDK, like libangraybird.so. I think most of them are pre-linked, so it is bad for us. Because there is no $a, $t and $d symbols, we cannot figure out which region is arm code or thumb code statically.</div>
<div><br></div><div><br><div class="gmail_quote">On Thu, Jun 7, 2012 at 8:11 PM, Tim Northover <span dir="ltr"><<a href="mailto:t.p.northover@gmail.com" target="_blank">t.p.northover@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi David,<br>
<div class="im"><br>
On Thu, Jun 7, 2012 at 10:17 AM, Fan Dawei <<a href="mailto:fandawei.s@gmail.com">fandawei.s@gmail.com</a>> wrote:<br>
> Could you please tell me more about $a, $t and $d symbols? How these symbols<br>
> are used to define different regions? Where I can find this symbols in ELF<br>
> object file?<br>
<br>
</div>At the start of each range of ARM code, an assembler or compiler<br>
should produce a "$a" symbol with that address, and put it (naturally<br>
enough) in the ELF symbol-table. Similarly each stretch of Thumb code<br>
gets a "$t" and each data a "$d".<br>
<br>
For example if I assemble:<br>
<br>
.arm<br>
mov r0, r3<br>
ldr r2, Lit<br>
Lit:<br>
.word 42<br>
add r0, r0, r0<br>
.thumb<br>
mov r5, r2<br>
<br>
then the symbol table contains these entries:<br>
4: 00000000 0 NOTYPE LOCAL DEFAULT 1 $a<br>
[...]<br>
6: 00000008 0 NOTYPE LOCAL DEFAULT 1 $d<br>
7: 0000000c 0 NOTYPE LOCAL DEFAULT 1 $a<br>
8: 00000010 0 NOTYPE LOCAL DEFAULT 1 $t<br>
<br>
which shows that an ARM region begins at offset 0x0, a data one at<br>
offset 0x8, we switch back to ARM at 0xc and finally Thumb takes over<br>
at 0x10.<br>
<br>
GNU objdump hides the symbols by default when printing the<br>
symbol-table (you can give it the --special-syms option to show them),<br>
but readelf shows them always.<br>
<br>
If you want the really deep details, they're fully documented in the<br>
ARM ELF ABI here (section 4.6.5):<br>
<br>
<a href="http://infocenter.arm.com/help/topic/com.arm.doc.ihi0044d/IHI0044D_aaelf.pdf" target="_blank">http://infocenter.arm.com/help/topic/com.arm.doc.ihi0044d/IHI0044D_aaelf.pdf</a><br>
<br>
Which is all nice to know, but I'm afraid it probably doesn't offer an<br>
immediate solution to the undefined instructions:<br>
+ libc.so isn't a relocatable object file (well, it is dynamically,<br>
but that doesn't count).<br>
+ llvm-objdump ignores them anyway at the moment, as far as I can tell.<br>
<span class="HOEnZb"><font color="#888888"><br>
Tim.<br>
</font></span></blockquote></div><br></div>