[PATCH] D24636: llvm-objdump Allow disassembly of ARM and thumb code mix in ELF object file.

Mon Sep 19 02:23:14 PDT 2016

peter.smith added a comment.

I've not got a lot to add over what Renato and other comments on llvm-dev have made.  I understand that this is probably close to the best incremental change that could be made to llvm-objdump, but I think that this is probably too intrusive a change just for ARM as it is and I think we need to be clearer about the requirements in some areas such as using attributes.

Some comments that might be useful in the future:
It would be good to write down what the scope of disassembly you are aiming for. For example: Is it architectural disassembly where instructions not supported on the target architecture come out as undefined (i.e. disassembly of an ARMv7a object for an ARMv5 target) or a universal if it is a legal ARM or Thumb instruction in any architecture disassemble it regardless of architecture.

Are you intending to try and support stripped binaries with no mapping symbols or static symbol table? There are overlaps between ARM and Thumb bit patterns, and of course literal data so I fear that even trying to do this may cause more problems than it solves.

I think that putting attribute reading code directly into llvm-objdump, potentially duplicating code in llvm-readobj isn't the right thing to do. We should have an attribute reading/writing library that tools can use. This would be really useful for lld for example.

Disassembly is an architectural property and not a property of the CPU name, it is true that a CPU name infers a default set of attributes, but there are ways to alter these defaults and have different properties in the object file. If we are going to read the attributes I think we should be reading the architecture and the various supporting attributes to work out what the target and subtarget features are. In an ideal world we shouldn't be making any disassembly decisions based on the CPU name alone.

When mapping symbols aren't available but the static symbol table still persists, it is possible to use the state of the last STT_FUNC symbol definition (bit 0 == 1 for Thumb) and (bit 0 == 0) for ARM to determine ARM or Thumb. This won't work if there is a state change or literal without another STT_FUNC or STT_OBJ symbol, but it is a reasonable heuristic.

I think that any llvm-objdump is going to end up with an ARM disassembler and a Thumb disassembler, however I think that there may be neater ways to switch between them in a refactored llvm-objdump. For example the mapping symbols identify a non-overlapping range of addresses that are either ARM, Thumb or literal data and there is one disassembler for each of those ranges, a design that in effect does DisassembleRange(Start, End, Disassembler) would work reasonably well.

Repository:
  rL LLVM

https://reviews.llvm.org/D24636