[all-commits] [llvm/llvm-project] 72017e: [llvm-objdump, ARM] Fix big-endian AArch32 disassem...

Simon Tatham via All-commits all-commits at lists.llvm.org
Mon Aug 8 02:50:10 PDT 2022


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 72017e9b16b737c5bd7c1dd33abff36f368fa724
      https://github.com/llvm/llvm-project/commit/72017e9b16b737c5bd7c1dd33abff36f368fa724
  Author: Simon Tatham <simon.tatham at arm.com>
  Date:   2022-08-08 (Mon, 08 Aug 2022)

  Changed paths:
    M llvm/include/llvm/BinaryFormat/ELF.h
    M llvm/lib/ObjectYAML/ELFYAML.cpp
    M llvm/lib/Target/ARM/ARM.td
    M llvm/lib/Target/ARM/Disassembler/ARMDisassembler.cpp
    A llvm/test/tools/llvm-objdump/ELF/ARM/be-disasm.test
    M llvm/tools/llvm-objdump/llvm-objdump.cpp

  Log Message:
  -----------
  [llvm-objdump,ARM] Fix big-endian AArch32 disassembly.

The ABI for big-endian AArch32, as specified by AAELF32, is above-
averagely complicated. Relocatable object files are expected to store
instruction encodings in byte order matching the ELF file's endianness
(so, big-endian for a BE ELF file). But executable images can
//either// do that //or// store instructions little-endian regardless
of data and ELF endianness (to support BE32 and BE8 platforms
respectively). They signal the latter by setting the EF_ARM_BE8 flag
in the ELF header.

(In the case of the Thumb instruction set, this all means that each
16-bit halfword of a Thumb instruction is stored in one or other
endianness. The two halfwords of a 32-bit Thumb instruction must
appear in the same order no matter what, because the first halfword is
the one that must avoid overlapping the encoding of any 16-bit Thumb
instruction.)

llvm-objdump was unconditionally expecting Arm instructions to be
stored little-endian. So it would correctly disassemble a BE8 image,
but if you gave it a BE32 image or a BE object file, it would retrieve
every instruction in byte-swapped form and disassemble it to
nonsense. (Even an object file output by LLVM itself, because
ARMMCCodeEmitter outputs instructions big-endian in big-endian mode,
which is correct for writing an object file.)

This patch allows llvm-objdump to correctly disassemble all three of
those classes of Arm ELF file. It does it by introducing a new
SubtargetFeature for big-endian instructions, setting it from the ELF
image type and flags during llvm-objdump setup, and teaching both
ARMDisassembler and llvm-objdump itself to pay attention to it when
retrieving instruction data from a section being disassembled.

Differential Revision: https://reviews.llvm.org/D130902




More information about the All-commits mailing list