[all-commits] [llvm/llvm-project] 55f1fb: [MC, llvm-objdump, ARM] Target-dependent disassembly...

Tue Jul 26 01:35:46 PDT 2022

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 55f1fbf005fef1e4024b2b44db0842f23fc5ea64
      https://github.com/llvm/llvm-project/commit/55f1fbf005fef1e4024b2b44db0842f23fc5ea64
  Author: Simon Tatham <simon.tatham at arm.com>
  Date:   2022-07-26 (Tue, 26 Jul 2022)

  Changed paths:
    M llvm/include/llvm/MC/MCDisassembler/MCDisassembler.h
    M llvm/lib/MC/MCDisassembler/MCDisassembler.cpp
    M llvm/lib/Target/AArch64/Disassembler/AArch64Disassembler.cpp
    M llvm/lib/Target/AArch64/Disassembler/AArch64Disassembler.h
    M llvm/lib/Target/ARM/Disassembler/ARMDisassembler.cpp
    A llvm/test/tools/llvm-objdump/ELF/ARM/unknown-instr-resync.test
    M llvm/test/tools/llvm-objdump/ELF/ARM/unknown-instr.test
    M llvm/tools/llvm-objdump/llvm-objdump.cpp
    M llvm/tools/sancov/sancov.cpp

  Log Message:
  -----------
  [MC,llvm-objdump,ARM] Target-dependent disassembly resync policy.

Currently, when llvm-objdump is disassembling a code section and
encounters a point where no instruction can be decoded, it uses the
same policy on all targets: consume one byte of the section, emit it
as "<unknown>", and try disassembling from the next byte position.

On an architecture where instructions are always 4 bytes long and
4-byte aligned, this makes no sense at all. If a 4-byte word cannot be
decoded as an instruction, then the next place that a valid
instruction could //possibly// be found is 4 bytes further on.
Disassembling from a misaligned address can't possibly produce
anything that the code generator intended, or that the CPU would even
attempt to execute.

This patch introduces a new MCDisassembler virtual method called
`suggestBytesToSkip`, which allows each target to choose its own
resynchronization policy. For Arm (as opposed to Thumb) and AArch64,
I've filled in the new method to return a fixed width of 4.

Thumb is a more interesting case, because the criterion for
identifying 2-byte and 4-byte instruction encodings is very simple,
and doesn't require the particular instruction to be recognized. So
`suggestBytesToSkip` is also passed an ArrayRef of the bytes in
question, so that it can take that into account. The new test case
shows Thumb disassembly skipping over two unrecognized instructions,
and identifying one as 2-byte and one as 4-byte.

For targets other than Arm and AArch64, this is NFC: the base class
implementation of `suggestBytesToSkip` still returns 1, so that the
existing behavior is unchanged. Other targets can fill in their own
implementations as they see fit; I haven't attempted to choose a new
behavior for each one myself.

I've updated all the call sites of `MCDisassembler::getInstruction` in
llvm-objdump, and also one in sancov, which was the only other place I
spotted the same idiom of `if (Size == 0) Size = 1` after a call to
`getInstruction`.

Reviewed By: DavidSpickett

Differential Revision: https://reviews.llvm.org/D130357

  Commit: 2b38f589301d7defef6099b57ecf45139010a5a7
      https://github.com/llvm/llvm-project/commit/2b38f589301d7defef6099b57ecf45139010a5a7
  Author: Simon Tatham <simon.tatham at arm.com>
  Date:   2022-07-26 (Tue, 26 Jul 2022)

  Changed paths:
    M lld/test/COFF/arm-thumb-thunks-multipass.s
    M lld/test/COFF/arm-thumb-thunks.s
    M lld/test/COFF/arm64-delayimport.yaml
    M lld/test/COFF/arm64-import2.test
    M lld/test/COFF/arm64-relocs-imports.test
    M lld/test/COFF/arm64-thunks.s
    M lld/test/COFF/armnt-blx23t.test
    M lld/test/COFF/armnt-branch24t.test
    M lld/test/COFF/armnt-mov32t-exec.test
    M lld/test/COFF/armnt-movt32t.test
    M lld/test/COFF/delayimports-armnt.yaml
    M lld/test/ELF/aarch64-cortex-a53-843419-address.s
    M lld/test/ELF/aarch64-cortex-a53-843419-large.s
    M lld/test/ELF/aarch64-cortex-a53-843419-recognize.s
    M lld/test/ELF/aarch64-cortex-a53-843419-tlsrelax.s
    M lld/test/ELF/aarch64-relocs.s
    M lld/test/ELF/arm-bl-v6-inrange.s
    M lld/test/ELF/arm-bl-v6.s
    M lld/test/ELF/arm-blx.s
    M lld/test/ELF/arm-branch-undef-weak-plt-thunk.s
    M lld/test/ELF/arm-exidx-order.s
    M lld/test/ELF/arm-force-pi-thunk.s
    M lld/test/ELF/arm-got-relative.s
    M lld/test/ELF/arm-icf-exidx.s
    M lld/test/ELF/arm-long-thunk-converge.s
    M lld/test/ELF/arm-reloc-abs32.s
    M lld/test/ELF/arm-sbrel32.s
    M lld/test/ELF/arm-thumb-branch.s
    M lld/test/ELF/arm-thumb-condbranch-thunk.s
    M lld/test/ELF/arm-thumb-interwork-thunk-v5.s
    M lld/test/ELF/arm-thumb-mix-range-thunk-os.s
    M lld/test/ELF/arm-thumb-narrow-branch-check.s
    M lld/test/ELF/arm-thumb-plt-range-thunk-os.s
    M lld/test/ELF/arm-thumb-plt-reloc.s
    M lld/test/ELF/arm-thumb-range-thunk-os.s
    M lld/test/ELF/arm-thumb-thunk-empty-pass.s
    M lld/test/ELF/arm-thumb-thunk-v6m.s
    M lld/test/ELF/arm-thumb-undefined-weak-narrow.test
    M lld/test/ELF/arm-thunk-edgecase.s
    M lld/test/ELF/arm-thunk-linkerscript-dotexpr.s
    M lld/test/ELF/arm-thunk-linkerscript-large.s
    M lld/test/ELF/arm-thunk-linkerscript-orphan.s
    M lld/test/ELF/arm-thunk-linkerscript-sort.s
    M lld/test/ELF/arm-thunk-linkerscript.s
    M lld/test/ELF/arm-thunk-multipass.s
    M lld/test/ELF/arm-thunk-nosuitable.s
    M lld/test/ELF/arm-thunk-re-add.s
    M lld/test/ELF/arm-tls-gd32.s
    M lld/test/ELF/arm-tls-ie32.s
    M lld/test/ELF/arm-tls-ldm32.s
    M lld/test/ELF/arm-tls-le32.s
    M llvm/test/tools/llvm-objdump/ELF/AArch64/disassemble-align.s
    M llvm/test/tools/llvm-objdump/ELF/AArch64/elf-aarch64-mapping-symbols.test
    M llvm/test/tools/llvm-objdump/ELF/ARM/debug-vars-dwarf4.s
    M llvm/test/tools/llvm-objdump/ELF/ARM/debug-vars-wide-chars.s
    M llvm/test/tools/llvm-objdump/ELF/ARM/invalid-instruction.s
    M llvm/test/tools/llvm-objdump/ELF/ARM/unknown-instr-resync.test
    M llvm/test/tools/llvm-objdump/ELF/ARM/unknown-instr.test
    M llvm/test/tools/llvm-objdump/ELF/ARM/v5t-subarch.s
    M llvm/test/tools/llvm-objdump/ELF/ARM/v5te-subarch.s
    M llvm/test/tools/llvm-objdump/ELF/ARM/v5tej-subarch.s
    M llvm/test/tools/llvm-objdump/ELF/ARM/v6-subarch.s
    M llvm/test/tools/llvm-objdump/ELF/ARM/v6-subfeatures.s
    M llvm/test/tools/llvm-objdump/ELF/ARM/v6k-subarch.s
    M llvm/test/tools/llvm-objdump/ELF/ARM/v6m-subarch.s
    M llvm/test/tools/llvm-objdump/ELF/ARM/v6t2-subarch.s
    M llvm/test/tools/llvm-objdump/ELF/ARM/v7a-subfeature.s
    M llvm/test/tools/llvm-objdump/ELF/ARM/v7m-subarch.s
    M llvm/test/tools/llvm-objdump/ELF/ARM/v7m-subfeatures.s
    M llvm/test/tools/llvm-objdump/ELF/ARM/v7r-subfeatures.s
    M llvm/test/tools/llvm-objdump/ELF/ARM/v8a-subarch.s
    M llvm/test/tools/llvm-objdump/ELF/ARM/v8r-subarch.s
    M llvm/test/tools/llvm-objdump/MachO/AArch64/pc-rel-targets.test
    M llvm/tools/llvm-objdump/llvm-objdump.cpp

  Log Message:
  -----------
  [llvm-objdump,ARM] Add PrettyPrinters for Arm and AArch64.

Most Arm disassemblers, including GNU objdump and Arm's own `fromelf`,
emit an instruction's raw encoding as a 32-bit words or (for Thumb)
one or two 16-bit halfwords, in logical order rather than according to
their storage endianness. This is generally easier to read: it matches
the encoding diagrams in the architecture spec, it matches the value
you'd write in a `.inst` directive, and it means that fields within
the instruction encoding that span more than one byte (such as branch
offsets or `SVC` immediates) can be read directly in the encoding
without having to mentally reverse the bytes.

llvm-objdump already has a system of PrettyPrinter subclasses which
makes it easy for a target to drop in its own preferred formatting.
This patch adds pretty-printers for all the Arm targets, so that
llvm-objdump will display Arm instruction encodings in their preferred
layout instead of little-endian and bytewise.

Reviewed By: DavidSpickett

Differential Revision: https://reviews.llvm.org/D130358

  Commit: 1bc7b06ffd9bf54ef6a507d49151f45fd904b8fd
      https://github.com/llvm/llvm-project/commit/1bc7b06ffd9bf54ef6a507d49151f45fd904b8fd
  Author: Simon Tatham <simon.tatham at arm.com>
  Date:   2022-07-26 (Tue, 26 Jul 2022)

  Changed paths:
    M llvm/test/tools/llvm-objdump/ELF/AArch64/disassemble-align.s
    M llvm/tools/llvm-objdump/llvm-objdump.cpp

  Log Message:
  -----------
  [llvm-objdump,ARM] Make dumpARMELFData line up with instructions.

The whitespace in output lines containing disassembled instructions
was extremely mismatched against that in `.word` lines produced from
dumping literal pools and other data in Arm ELF files. This patch
adjusts `dumpARMELFData` so that it uses the same alignment system as
in the instruction pretty-printers. Now the two classes of line are
aligned sensibly alongside each other.

Reviewed By: DavidSpickett

Differential Revision: https://reviews.llvm.org/D130359

Compare: https://github.com/llvm/llvm-project/compare/c4b6e5f9500f...1bc7b06ffd9b