[llvm] 72017e9 - [llvm-objdump, ARM] Fix big-endian AArch32 disassembly.
Simon Tatham via llvm-commits
llvm-commits at lists.llvm.org
Mon Aug 8 02:50:03 PDT 2022
Author: Simon Tatham
Date: 2022-08-08T10:49:51+01:00
New Revision: 72017e9b16b737c5bd7c1dd33abff36f368fa724
URL: https://github.com/llvm/llvm-project/commit/72017e9b16b737c5bd7c1dd33abff36f368fa724
DIFF: https://github.com/llvm/llvm-project/commit/72017e9b16b737c5bd7c1dd33abff36f368fa724.diff
LOG: [llvm-objdump,ARM] Fix big-endian AArch32 disassembly.
The ABI for big-endian AArch32, as specified by AAELF32, is above-
averagely complicated. Relocatable object files are expected to store
instruction encodings in byte order matching the ELF file's endianness
(so, big-endian for a BE ELF file). But executable images can
//either// do that //or// store instructions little-endian regardless
of data and ELF endianness (to support BE32 and BE8 platforms
respectively). They signal the latter by setting the EF_ARM_BE8 flag
in the ELF header.
(In the case of the Thumb instruction set, this all means that each
16-bit halfword of a Thumb instruction is stored in one or other
endianness. The two halfwords of a 32-bit Thumb instruction must
appear in the same order no matter what, because the first halfword is
the one that must avoid overlapping the encoding of any 16-bit Thumb
instruction.)
llvm-objdump was unconditionally expecting Arm instructions to be
stored little-endian. So it would correctly disassemble a BE8 image,
but if you gave it a BE32 image or a BE object file, it would retrieve
every instruction in byte-swapped form and disassemble it to
nonsense. (Even an object file output by LLVM itself, because
ARMMCCodeEmitter outputs instructions big-endian in big-endian mode,
which is correct for writing an object file.)
This patch allows llvm-objdump to correctly disassemble all three of
those classes of Arm ELF file. It does it by introducing a new
SubtargetFeature for big-endian instructions, setting it from the ELF
image type and flags during llvm-objdump setup, and teaching both
ARMDisassembler and llvm-objdump itself to pay attention to it when
retrieving instruction data from a section being disassembled.
Differential Revision: https://reviews.llvm.org/D130902
Added:
llvm/test/tools/llvm-objdump/ELF/ARM/be-disasm.test
Modified:
llvm/include/llvm/BinaryFormat/ELF.h
llvm/lib/ObjectYAML/ELFYAML.cpp
llvm/lib/Target/ARM/ARM.td
llvm/lib/Target/ARM/Disassembler/ARMDisassembler.cpp
llvm/tools/llvm-objdump/llvm-objdump.cpp
Removed:
################################################################################
diff --git a/llvm/include/llvm/BinaryFormat/ELF.h b/llvm/include/llvm/BinaryFormat/ELF.h
index a0bb50db8c544..635ec25c90a68 100644
--- a/llvm/include/llvm/BinaryFormat/ELF.h
+++ b/llvm/include/llvm/BinaryFormat/ELF.h
@@ -435,6 +435,7 @@ enum : unsigned {
EF_ARM_ABI_FLOAT_SOFT = 0x00000200U, // EABI_VER5
EF_ARM_VFP_FLOAT = 0x00000400U, // Legacy pre EABI_VER5
EF_ARM_ABI_FLOAT_HARD = 0x00000400U, // EABI_VER5
+ EF_ARM_BE8 = 0x00800000U,
EF_ARM_EABI_UNKNOWN = 0x00000000U,
EF_ARM_EABI_VER1 = 0x01000000U,
EF_ARM_EABI_VER2 = 0x02000000U,
diff --git a/llvm/lib/ObjectYAML/ELFYAML.cpp b/llvm/lib/ObjectYAML/ELFYAML.cpp
index 9ad2c41351672..4308276c55a15 100644
--- a/llvm/lib/ObjectYAML/ELFYAML.cpp
+++ b/llvm/lib/ObjectYAML/ELFYAML.cpp
@@ -424,6 +424,7 @@ void ScalarBitSetTraits<ELFYAML::ELF_EF>::bitset(IO &IO,
BCaseMask(EF_ARM_EABI_VER3, EF_ARM_EABIMASK);
BCaseMask(EF_ARM_EABI_VER4, EF_ARM_EABIMASK);
BCaseMask(EF_ARM_EABI_VER5, EF_ARM_EABIMASK);
+ BCaseMask(EF_ARM_BE8, EF_ARM_BE8);
break;
case ELF::EM_MIPS:
BCase(EF_MIPS_NOREORDER);
diff --git a/llvm/lib/Target/ARM/ARM.td b/llvm/lib/Target/ARM/ARM.td
index 71388bc4efa4c..4da91e1166cbe 100644
--- a/llvm/lib/Target/ARM/ARM.td
+++ b/llvm/lib/Target/ARM/ARM.td
@@ -729,6 +729,24 @@ def FeatureHardenSlsNoComdat : SubtargetFeature<"harden-sls-nocomdat",
"HardenSlsNoComdat", "true",
"Generate thunk code for SLS mitigation in the normal text section">;
+//===----------------------------------------------------------------------===//
+// Endianness of instruction encodings in memory.
+//
+// In the current Arm architecture, this is usually little-endian regardless of
+// data endianness. But before Armv7 it was typical for instruction endianness
+// to match data endianness, so that a big-endian system was consistently big-
+// endian. And Armv7-R can be configured to use big-endian instructions.
+//
+// Additionally, even when targeting Armv7-A, big-endian instructions can be
+// found in relocatable object files, because the Arm ABI specifies that the
+// linker byte-reverses them depending on the target architecture.
+//
+// So we have a feature here to indicate that instructions are stored big-
+// endian, which you can set when instantiating an MCDisassembler.
+def ModeBigEndianInstructions : SubtargetFeature<"big-endian-instructions",
+ "BigEndianInstructions", "true",
+ "Expect instructions to be stored big-endian.">;
+
//===----------------------------------------------------------------------===//
// ARM Processor subtarget features.
//
diff --git a/llvm/lib/Target/ARM/Disassembler/ARMDisassembler.cpp b/llvm/lib/Target/ARM/Disassembler/ARMDisassembler.cpp
index f814959854059..f15cbb7c4fe55 100644
--- a/llvm/lib/Target/ARM/Disassembler/ARMDisassembler.cpp
+++ b/llvm/lib/Target/ARM/Disassembler/ARMDisassembler.cpp
@@ -131,6 +131,9 @@ class ARMDisassembler : public MCDisassembler {
public:
ARMDisassembler(const MCSubtargetInfo &STI, MCContext &Ctx) :
MCDisassembler(STI, Ctx) {
+ InstructionEndianness = STI.getFeatureBits()[ARM::ModeBigEndianInstructions]
+ ? llvm::support::big
+ : llvm::support::little;
}
~ARMDisassembler() override = default;
@@ -156,6 +159,8 @@ class ARMDisassembler : public MCDisassembler {
DecodeStatus AddThumbPredicate(MCInst&) const;
void UpdateThumbVFPPredicate(DecodeStatus &, MCInst&) const;
+
+ llvm::support::endianness InstructionEndianness;
};
} // end anonymous namespace
@@ -765,7 +770,8 @@ uint64_t ARMDisassembler::suggestBytesToSkip(ArrayRef<uint8_t> Bytes,
if (Bytes.size() < 2)
return 2;
- uint16_t Insn16 = (Bytes[1] << 8) | Bytes[0];
+ uint16_t Insn16 = llvm::support::endian::read<uint16_t>(
+ Bytes.data(), InstructionEndianness);
return Insn16 < 0xE800 ? 2 : 4;
}
@@ -794,9 +800,9 @@ DecodeStatus ARMDisassembler::getARMInstruction(MCInst &MI, uint64_t &Size,
return MCDisassembler::Fail;
}
- // Encoded as a small-endian 32-bit word in the stream.
- uint32_t Insn =
- (Bytes[3] << 24) | (Bytes[2] << 16) | (Bytes[1] << 8) | (Bytes[0] << 0);
+ // Encoded as a 32-bit word in the stream.
+ uint32_t Insn = llvm::support::endian::read<uint32_t>(Bytes.data(),
+ InstructionEndianness);
// Calling the auto-generated decoder function.
DecodeStatus Result =
@@ -1084,7 +1090,8 @@ DecodeStatus ARMDisassembler::getThumbInstruction(MCInst &MI, uint64_t &Size,
return MCDisassembler::Fail;
}
- uint16_t Insn16 = (Bytes[1] << 8) | Bytes[0];
+ uint16_t Insn16 = llvm::support::endian::read<uint16_t>(
+ Bytes.data(), InstructionEndianness);
DecodeStatus Result =
decodeInstruction(DecoderTableThumb16, MI, Insn16, Address, this, STI);
if (Result != MCDisassembler::Fail) {
@@ -1138,7 +1145,8 @@ DecodeStatus ARMDisassembler::getThumbInstruction(MCInst &MI, uint64_t &Size,
}
uint32_t Insn32 =
- (Bytes[3] << 8) | (Bytes[2] << 0) | (Bytes[1] << 24) | (Bytes[0] << 16);
+ (uint32_t(Insn16) << 16) | llvm::support::endian::read<uint16_t>(
+ Bytes.data() + 2, InstructionEndianness);
Result =
decodeInstruction(DecoderTableMVE32, MI, Insn32, Address, this, STI);
diff --git a/llvm/test/tools/llvm-objdump/ELF/ARM/be-disasm.test b/llvm/test/tools/llvm-objdump/ELF/ARM/be-disasm.test
new file mode 100644
index 0000000000000..3d8add2a52084
--- /dev/null
+++ b/llvm/test/tools/llvm-objdump/ELF/ARM/be-disasm.test
@@ -0,0 +1,91 @@
+# RUN: yaml2obj --docnum=1 -DCONTENT=FA000002E59F100CE0800001E12FFF1E4802EB00308047703141592627182818 %s | llvm-objdump -d --triple=armv7r - | FileCheck %s
+# RUN: yaml2obj --docnum=1 -DCONTENT=020000FA0C109FE5010080E01EFF2FE1024800EB803070473141592627182818 -DFLAG=,EF_ARM_BE8 %s | llvm-objdump -d --triple=armv7r - | FileCheck %s
+# RUN: yaml2obj --docnum=2 -DCONTENT=FA000002E59F100CE0800001E12FFF1E4802EB00308047703141592627182818 %s | llvm-objdump -d --triple=armv7r - | FileCheck %s
+
+## Test llvm-objdump disassembly of all three kinds of
+## AAELF32-compliant big-endian ELF file.
+##
+## In image files, by default AArch32 ELF stores the instructions
+## big-endian ('BE32' style), unless the EF_ARM_BE8 flag is set in the
+## ELF header, which indicates that instructions are stored
+## little-endian ('BE8' style). llvm-objdump should detect the flag and
+## handle both types, using the $a, $t and $d mapping symbols to
+## distinguish Arm instructions, Thumb instructions, and data.
+##
+## Relocatable object files always use the BE32 style. (The linker is
+## expected to byte-swap code sections, using the same the mapping
+## symbols to decide how, if it's going to generate an image with BE8
+## instruction endianness and the BE8 flag set.)
+##
+## This test checks all three cases of this. It provides llvm-objdump
+## with the BE32 and BE8 versions of the same image file, with the code
+## section byte-swapped, and the EF_ARM_BE8 flag absent and present
+## respectively to indicate that. We also provide a matching object
+## file. We expect the identical disassembly from both, apart from the
+## detail that addresses in the ELF images start at 0x8000 and section
+## offsets in the object start at 0.
+
+# CHECK: 0: fa000002 blx
+# CHECK-NEXT: 4: e59f100c ldr r1, [pc, #12]
+# CHECK-NEXT: 8: e0800001 add r0, r0, r1
+# CHECK-NEXT: c: e12fff1e bx lr
+# CHECK: 10: 4802 ldr r0, [pc, #8]
+# CHECK-NEXT: 12: eb00 3080 add.w r0, r0, r0, lsl #14
+# CHECK-NEXT: 16: 4770 bx lr
+# CHECK: 18: 31 41 59 26 .word 0x31415926
+# CHECK-NEXT: 1c: 27 18 28 18 .word 0x27182818
+
+--- !ELF
+FileHeader:
+ Class: ELFCLASS32
+ Data: ELFDATA2MSB
+ Type: ET_EXEC
+ Machine: EM_ARM
+ Flags: [ EF_ARM_EABI_UNKNOWN[[FLAG=]] ]
+ Entry: 0x8000
+ProgramHeaders:
+ - Type: PT_LOAD
+ Flags: [ PF_X, PF_R ]
+ FirstSec: .text
+ LastSec: .text
+ VAddr: 0x8000
+ Align: 0x4
+Sections:
+ - Name: .text
+ Type: SHT_PROGBITS
+ Flags: [ SHF_ALLOC, SHF_EXECINSTR ]
+ Address: 0x8000
+ AddressAlign: 0x4
+ Content: [[CONTENT]]
+Symbols:
+ - Name: '$a'
+ Section: .text
+ Value: 0x8000
+ - Name: '$t'
+ Section: .text
+ Value: 0x8010
+ - Name: '$d'
+ Section: .text
+ Value: 0x8018
+
+--- !ELF
+FileHeader:
+ Class: ELFCLASS32
+ Data: ELFDATA2MSB
+ Type: ET_REL
+ Machine: EM_ARM
+Sections:
+ - Name: .text
+ Type: SHT_PROGBITS
+ Flags: [ SHF_ALLOC, SHF_EXECINSTR ]
+ AddressAlign: 0x4
+ Content: [[CONTENT]]
+Symbols:
+ - Name: '$a'
+ Section: .text
+ - Name: '$t'
+ Section: .text
+ Value: 0x10
+ - Name: '$d'
+ Section: .text
+ Value: 0x18
diff --git a/llvm/tools/llvm-objdump/llvm-objdump.cpp b/llvm/tools/llvm-objdump/llvm-objdump.cpp
index fd83dc197fe9a..efac3a883b5a2 100644
--- a/llvm/tools/llvm-objdump/llvm-objdump.cpp
+++ b/llvm/tools/llvm-objdump/llvm-objdump.cpp
@@ -690,14 +690,14 @@ class ARMPrettyPrinter : public PrettyPrinter {
OS << ' '
<< format_hex_no_prefix(
llvm::support::endian::read<uint16_t>(
- Bytes.data() + Pos, llvm::support::little),
+ Bytes.data() + Pos, InstructionEndianness),
4);
} else {
for (; Pos + 4 <= End; Pos += 4)
OS << ' '
<< format_hex_no_prefix(
llvm::support::endian::read<uint32_t>(
- Bytes.data() + Pos, llvm::support::little),
+ Bytes.data() + Pos, InstructionEndianness),
8);
}
if (Pos < End) {
@@ -713,6 +713,13 @@ class ARMPrettyPrinter : public PrettyPrinter {
} else
OS << "\t<unknown>";
}
+
+ void setInstructionEndianness(llvm::support::endianness Endianness) {
+ InstructionEndianness = Endianness;
+ }
+
+private:
+ llvm::support::endianness InstructionEndianness = llvm::support::little;
};
ARMPrettyPrinter ARMPrettyPrinterInst;
@@ -1852,6 +1859,29 @@ static void disassembleObject(ObjectFile *Obj, bool InlineRelocs) {
if (MCPU.empty())
MCPU = Obj->tryGetCPUName().value_or("").str();
+ if (isArmElf(*Obj)) {
+ // When disassembling big-endian Arm ELF, the instruction endianness is
+ // determined in a complex way. In relocatable objects, AAELF32 mandates
+ // that instruction endianness matches the ELF file endianness; in
+ // executable images, that's true unless the file header has the EF_ARM_BE8
+ // flag, in which case instructions are little-endian regardless of data
+ // endianness.
+ //
+ // We must set the big-endian-instructions SubtargetFeature to make the
+ // disassembler read the instructions the right way round, and also tell
+ // our own prettyprinter to retrieve the encodings the same way to print in
+ // hex.
+ const auto *Elf32BE = dyn_cast<ELF32BEObjectFile>(Obj);
+
+ if (Elf32BE && (Elf32BE->isRelocatableObject() ||
+ !(Elf32BE->getPlatformFlags() & ELF::EF_ARM_BE8))) {
+ Features.AddFeature("+big-endian-instructions");
+ ARMPrettyPrinterInst.setInstructionEndianness(llvm::support::big);
+ } else {
+ ARMPrettyPrinterInst.setInstructionEndianness(llvm::support::little);
+ }
+ }
+
std::unique_ptr<const MCSubtargetInfo> STI(
TheTarget->createMCSubtargetInfo(TripleName, MCPU, Features.getString()));
if (!STI)
More information about the llvm-commits
mailing list