[llvm] [RISCV][MC] Implement evaluateBranch for auipc+jalr pairs (PR #65480)
Job Noorman via llvm-commits
llvm-commits at lists.llvm.org
Tue Oct 3 23:35:12 PDT 2023
https://github.com/mtvec updated https://github.com/llvm/llvm-project/pull/65480
>From 72286701c48d722a7fbb43b4e1ca36b4f99ef2e1 Mon Sep 17 00:00:00 2001
From: Job Noorman <jnoorman at igalia.com>
Date: Wed, 6 Sep 2023 11:29:28 +0200
Subject: [PATCH 1/5] [MC][NFC] Allow MCInstrAnalysis to store state
Currently, all the analysis functions provided by `MCInstrAnalysis` work
on a single instruction. On some targets, this limits the kind of
instructions that can be successfully analyzed as common constructs may
need multiple instructions.
For example, a typical call sequence on RISC-V uses a auipc+jalr pair.
In order to analyse the jalr inside `evaluateBranch`, information about
the corresponding auipc is needed. Similarly, AArch64 uses adrp+ldr
pairs to access globals.
This patch proposes to add state to `MCInstrAnalysis` to support these
use cases. Two new virtual methods are added:
- `updateState`: takes an instruction and its address. This methods
should be called by clients on every instruction and allows targets to
store whatever information they need to analyse future instructions.
- `resetState`: clears the state whenever it becomes irrelevant. Clients
could call this, for example, when starting to disassemble a new
function.
Note that the default implementations do nothing so this patch is NFC.
No actual state is stored inside `MCInstrAnalysis`; deciding the
structure of the state is left to the targets.
This patch also modifies llvm-objdump to use the new interface.
This patch is an alternative to D116677 and the idea of storing state in
`MCInstrAnalysis` was first discussed there.
---
llvm/include/llvm/MC/MCInstrAnalysis.h | 15 +++++++++++++++
llvm/tools/llvm-objdump/llvm-objdump.cpp | 21 ++++++++++++++++-----
2 files changed, 31 insertions(+), 5 deletions(-)
diff --git a/llvm/include/llvm/MC/MCInstrAnalysis.h b/llvm/include/llvm/MC/MCInstrAnalysis.h
index c3c675c39c5590c..dac12af599e6f34 100644
--- a/llvm/include/llvm/MC/MCInstrAnalysis.h
+++ b/llvm/include/llvm/MC/MCInstrAnalysis.h
@@ -37,6 +37,21 @@ class MCInstrAnalysis {
MCInstrAnalysis(const MCInstrInfo *Info) : Info(Info) {}
virtual ~MCInstrAnalysis() = default;
+ /// Clear the internal state. See updateState for more information.
+ virtual void resetState() {}
+
+ /// Update internal state with \p Inst at \p Addr.
+ ///
+ /// For some types a analyses, inspecting a single instruction is not
+ /// sufficient. Some examples are auipc/jalr pairs on RISC-V or adrp/ldr pairs
+ /// on AArch64. To support inspecting multiple instructions, targets may keep
+ /// track of an internal state while analysing instructions. Clients should
+ /// call updateState for every instruction which allows later calls to one of
+ /// the analysis functions to take previous instructions into account.
+ /// Whenever state becomes irrelevant (e.g., when starting to disassemble a
+ /// new function), clients should call resetState to clear it.
+ virtual void updateState(const MCInst &Inst, uint64_t Addr) {}
+
virtual bool isBranch(const MCInst &Inst) const {
return Info->get(Inst.getOpcode()).isBranch();
}
diff --git a/llvm/tools/llvm-objdump/llvm-objdump.cpp b/llvm/tools/llvm-objdump/llvm-objdump.cpp
index 96d74d6e2d5e865..8f6479d3c6e31e4 100644
--- a/llvm/tools/llvm-objdump/llvm-objdump.cpp
+++ b/llvm/tools/llvm-objdump/llvm-objdump.cpp
@@ -842,7 +842,7 @@ class DisassemblerTarget {
std::unique_ptr<const MCSubtargetInfo> SubtargetInfo;
std::shared_ptr<MCContext> Context;
std::unique_ptr<MCDisassembler> DisAsm;
- std::shared_ptr<const MCInstrAnalysis> InstrAnalysis;
+ std::shared_ptr<MCInstrAnalysis> InstrAnalysis;
std::shared_ptr<MCInstPrinter> InstPrinter;
PrettyPrinter *Printer;
@@ -1265,14 +1265,19 @@ collectBBAddrMapLabels(const std::unordered_map<uint64_t, BBAddrMap> &AddrToBBAd
}
}
-static void collectLocalBranchTargets(
- ArrayRef<uint8_t> Bytes, const MCInstrAnalysis *MIA, MCDisassembler *DisAsm,
- MCInstPrinter *IP, const MCSubtargetInfo *STI, uint64_t SectionAddr,
- uint64_t Start, uint64_t End, std::unordered_map<uint64_t, std::string> &Labels) {
+static void
+collectLocalBranchTargets(ArrayRef<uint8_t> Bytes, MCInstrAnalysis *MIA,
+ MCDisassembler *DisAsm, MCInstPrinter *IP,
+ const MCSubtargetInfo *STI, uint64_t SectionAddr,
+ uint64_t Start, uint64_t End,
+ std::unordered_map<uint64_t, std::string> &Labels) {
// So far only supports PowerPC and X86.
if (!STI->getTargetTriple().isPPC() && !STI->getTargetTriple().isX86())
return;
+ if (MIA)
+ MIA->resetState();
+
Labels.clear();
unsigned LabelCount = 0;
Start += SectionAddr;
@@ -1298,6 +1303,7 @@ static void collectLocalBranchTargets(
!Labels.count(Target) &&
!(STI->getTargetTriple().isPPC() && Target == Index))
Labels[Target] = ("L" + Twine(LabelCount++)).str();
+ MIA->updateState(Inst, Index);
}
Index += Size;
}
@@ -1939,6 +1945,9 @@ disassembleObject(ObjectFile &Obj, const ObjectFile &DbgObj,
BBAddrMapLabels);
}
+ if (DT->InstrAnalysis)
+ DT->InstrAnalysis->resetState();
+
while (Index < End) {
// ARM and AArch64 ELF binaries can interleave data and text in the
// same section. We rely on the markers introduced to understand what
@@ -2155,6 +2164,8 @@ disassembleObject(ObjectFile &Obj, const ObjectFile &DbgObj,
if (TargetOS == &CommentStream)
*TargetOS << "\n";
}
+
+ DT->InstrAnalysis->updateState(Inst, SectionAddr + Index);
}
}
>From 2b9d08907bcf8c2fb16b2f35f39b68824b35bd1e Mon Sep 17 00:00:00 2001
From: Job Noorman <jnoorman at igalia.com>
Date: Wed, 6 Sep 2023 11:29:28 +0200
Subject: [PATCH 2/5] [RISCV][MC] Implement evaluateBranch for auipc+jalr pairs
This patch implements `MCInstrAnalysis` state in order to be able
analyze auipc+jalr pairs inside `evaluateBranch`.
This is implemented as follows:
- State: array of currently known GPR values;
- Whenever an auipc is detected in `updateState`, update the state value
of RD with the immediate;
- Whenever a jalr is detected in `evaluateBranch`, check if the state
holds a value for RS1 and use that to compute its target.
Note that this is similar to how binutils implements it and the output
of llvm-objdump should now mostly match the one of GNU objdump.
This patch also updates the relevant llvm-objdump patches and adds a new
one testing the output for interleaved auipc+jalr pairs.
---
.../RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp | 39 +++++++++++++++++++
.../tools/llvm-objdump/ELF/RISCV/branches.s | 4 +-
.../ELF/RISCV/multi-instr-target.s | 18 +++++++++
3 files changed, 59 insertions(+), 2 deletions(-)
create mode 100644 llvm/test/tools/llvm-objdump/ELF/RISCV/multi-instr-target.s
diff --git a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp
index 75af5c2de09469b..848f5b7e6d66fea 100644
--- a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp
+++ b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp
@@ -114,10 +114,40 @@ static MCTargetStreamer *createRISCVNullTargetStreamer(MCStreamer &S) {
namespace {
class RISCVMCInstrAnalysis : public MCInstrAnalysis {
+ std::optional<int64_t> GPRState[31];
+
+ void setGPRState(unsigned Reg, int64_t Value) {
+ assert(Reg >= RISCV::X0 && Reg <= RISCV::X31 && "Invalid GPR reg");
+
+ if (Reg != RISCV::X0)
+ GPRState[Reg - RISCV::X1] = Value;
+ }
+
+ std::optional<int64_t> getGPRState(unsigned Reg) const {
+ assert(Reg >= RISCV::X0 && Reg <= RISCV::X31 && "Invalid GPR reg");
+
+ if (Reg == RISCV::X0)
+ return 0;
+ return GPRState[Reg - RISCV::X1];
+ }
+
public:
explicit RISCVMCInstrAnalysis(const MCInstrInfo *Info)
: MCInstrAnalysis(Info) {}
+ void resetState() override {
+ std::fill(std::begin(GPRState), std::end(GPRState), std::nullopt);
+ }
+
+ void updateState(const MCInst &Inst, uint64_t Addr) override {
+ switch (Inst.getOpcode()) {
+ case RISCV::AUIPC:
+ setGPRState(Inst.getOperand(0).getReg(),
+ Addr + (Inst.getOperand(1).getImm() << 12));
+ break;
+ }
+ }
+
bool evaluateBranch(const MCInst &Inst, uint64_t Addr, uint64_t Size,
uint64_t &Target) const override {
if (isConditionalBranch(Inst)) {
@@ -140,6 +170,15 @@ class RISCVMCInstrAnalysis : public MCInstrAnalysis {
return true;
}
+ if (Inst.getOpcode() == RISCV::JALR) {
+ if (auto TargetRegState = getGPRState(Inst.getOperand(1).getReg())) {
+ Target = *TargetRegState + Inst.getOperand(2).getImm();
+ return true;
+ }
+
+ return false;
+ }
+
return false;
}
diff --git a/llvm/test/tools/llvm-objdump/ELF/RISCV/branches.s b/llvm/test/tools/llvm-objdump/ELF/RISCV/branches.s
index 5fec4e6e25a39a3..ebd86a702b70e7c 100644
--- a/llvm/test/tools/llvm-objdump/ELF/RISCV/branches.s
+++ b/llvm/test/tools/llvm-objdump/ELF/RISCV/branches.s
@@ -57,11 +57,11 @@ c.jal bar
c.j bar
# CHECK: auipc ra, 0
-# CHECK: jalr ra, 16(ra){{$}}
+# CHECK: jalr ra, 16(ra) <foo+0x58>
call .Llocal
# CHECK: auipc ra, 0
-# CHECK: jalr ra, 16(ra){{$}}
+# CHECK: jalr ra, 16(ra) <bar>
call bar
.Llocal:
diff --git a/llvm/test/tools/llvm-objdump/ELF/RISCV/multi-instr-target.s b/llvm/test/tools/llvm-objdump/ELF/RISCV/multi-instr-target.s
new file mode 100644
index 000000000000000..c1a05032e261857
--- /dev/null
+++ b/llvm/test/tools/llvm-objdump/ELF/RISCV/multi-instr-target.s
@@ -0,0 +1,18 @@
+# RUN: llvm-mc -filetype=obj -triple riscv32 -mattr=+c < %s | \
+# RUN: llvm-objdump -d -M no-aliases --no-show-raw-insn - | \
+# RUN: FileCheck %s
+
+## Test multiple interleaved auipc/jalr pairs
+# CHECK: auipc t0, 0
+1: auipc t0, %pcrel_hi(bar)
+# CHECK: auipc t1, 0
+2: auipc t1, %pcrel_hi(bar)
+# CHECK: jalr ra, 16(t0) <bar>
+jalr %pcrel_lo(1b)(t0)
+# CHECK: jalr ra, 12(t1) <bar>
+jalr %pcrel_lo(2b)(t1)
+
+# CHECK-LABEL: <bar>:
+bar:
+# CHECK: c.nop
+nop
>From 161279cf4de3f108d72a4a69ab7093c1fdf3572a Mon Sep 17 00:00:00 2001
From: Job Noorman <jnoorman at igalia.com>
Date: Thu, 7 Sep 2023 11:05:17 +0200
Subject: [PATCH 3/5] fixup! [RISCV][MC] Implement evaluateBranch for
auipc+jalr pairs
Make sure `updateState` clears register state when registers may be
clobbered:
- For any unrecognized instruction, clear its RD;
- For terminators, clear the entire state since the sequentially next
instruction will be the start of a new basic block;
- For call, clear the entire state since any register may be clobbered
by the callee.
---
.../RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp | 31 +++++++++++++++++--
.../ELF/RISCV/multi-instr-target.s | 31 +++++++++++++++++--
2 files changed, 57 insertions(+), 5 deletions(-)
diff --git a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp
index 848f5b7e6d66fea..5349fa0a4b30881 100644
--- a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp
+++ b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp
@@ -116,15 +116,19 @@ namespace {
class RISCVMCInstrAnalysis : public MCInstrAnalysis {
std::optional<int64_t> GPRState[31];
- void setGPRState(unsigned Reg, int64_t Value) {
- assert(Reg >= RISCV::X0 && Reg <= RISCV::X31 && "Invalid GPR reg");
+ static bool isGPR(unsigned Reg) {
+ return Reg >= RISCV::X0 && Reg <= RISCV::X31;
+ }
+
+ void setGPRState(unsigned Reg, std::optional<int64_t> Value) {
+ assert(isGPR(Reg) && "Invalid GPR reg");
if (Reg != RISCV::X0)
GPRState[Reg - RISCV::X1] = Value;
}
std::optional<int64_t> getGPRState(unsigned Reg) const {
- assert(Reg >= RISCV::X0 && Reg <= RISCV::X31 && "Invalid GPR reg");
+ assert(isGPR(Reg) && "Invalid GPR reg");
if (Reg == RISCV::X0)
return 0;
@@ -140,7 +144,28 @@ class RISCVMCInstrAnalysis : public MCInstrAnalysis {
}
void updateState(const MCInst &Inst, uint64_t Addr) override {
+ // Terminators mark the end of a basic block which means the sequentially
+ // next instruction will be the first of another basic block and the current
+ // state will typically not be valid anymore. For calls, we assume all
+ // registers may be clobbered by the callee (TODO: should we take the
+ // calling convention into account?).
+ if (isTerminator(Inst) || isCall(Inst)) {
+ resetState();
+ return;
+ }
+
switch (Inst.getOpcode()) {
+ default: {
+ // Clear the state of all defined registers for instructions that we don't
+ // explicitly support.
+ auto NumDefs = Info->get(Inst.getOpcode()).getNumDefs();
+ for (unsigned I = 0; I < NumDefs; ++I) {
+ auto DefReg = Inst.getOperand(I).getReg();
+ if (isGPR(DefReg))
+ setGPRState(DefReg, std::nullopt);
+ }
+ break;
+ }
case RISCV::AUIPC:
setGPRState(Inst.getOperand(0).getReg(),
Addr + (Inst.getOperand(1).getImm() << 12));
diff --git a/llvm/test/tools/llvm-objdump/ELF/RISCV/multi-instr-target.s b/llvm/test/tools/llvm-objdump/ELF/RISCV/multi-instr-target.s
index c1a05032e261857..91b643e961fc6df 100644
--- a/llvm/test/tools/llvm-objdump/ELF/RISCV/multi-instr-target.s
+++ b/llvm/test/tools/llvm-objdump/ELF/RISCV/multi-instr-target.s
@@ -7,11 +7,38 @@
1: auipc t0, %pcrel_hi(bar)
# CHECK: auipc t1, 0
2: auipc t1, %pcrel_hi(bar)
-# CHECK: jalr ra, 16(t0) <bar>
+# CHECK: jalr ra, {{[0-9]+}}(t0) <bar>
jalr %pcrel_lo(1b)(t0)
-# CHECK: jalr ra, 12(t1) <bar>
+## Target should not be printed because the call above clobbers register state
+# CHECK: jalr ra, {{[0-9]+}}(t1){{$}}
jalr %pcrel_lo(2b)(t1)
+## Test that auipc+jalr with a write to the target register in between does not
+## print the target
+# CHECK: auipc t0, 0
+1: auipc t0, %pcrel_hi(bar)
+# CHECK: c.li t0, 0
+li t0, 0
+# CHECK: jalr ra, {{[0-9]+}}(t0){{$}}
+jalr %pcrel_lo(1b)(t0)
+
+## Test that auipc+jalr with a write to an unrelated register in between does
+## print the target
+# CHECK: auipc t0, 0
+1: auipc t0, %pcrel_hi(bar)
+# CHECK: c.li t1, 0
+li t1, 0
+# CHECK: jalr ra, {{[0-9]+}}(t0) <bar>
+jalr %pcrel_lo(1b)(t0)
+
+## Test that auipc+jalr with a terminator in between does not print the target
+# CHECK: auipc t0, 0
+1: auipc t0, %pcrel_hi(bar)
+# CHECK: c.j {{.*}} <bar>
+j bar
+# CHECK: jalr ra, {{[0-9]+}}(t0){{$}}
+jalr %pcrel_lo(1b)(t0)
+
# CHECK-LABEL: <bar>:
bar:
# CHECK: c.nop
>From a9b9c043030b62a583fdc9b10c3cc15dbde89ab2 Mon Sep 17 00:00:00 2001
From: Job Noorman <jnoorman at igalia.com>
Date: Mon, 2 Oct 2023 10:32:32 +0200
Subject: [PATCH 4/5] fixup! [RISCV][MC] Implement evaluateBranch for
auipc+jalr pairs
Reset state when disassembly fails.
---
llvm/tools/llvm-objdump/llvm-objdump.cpp | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/llvm/tools/llvm-objdump/llvm-objdump.cpp b/llvm/tools/llvm-objdump/llvm-objdump.cpp
index 8f6479d3c6e31e4..385be9fb9257e16 100644
--- a/llvm/tools/llvm-objdump/llvm-objdump.cpp
+++ b/llvm/tools/llvm-objdump/llvm-objdump.cpp
@@ -1304,6 +1304,8 @@ collectLocalBranchTargets(ArrayRef<uint8_t> Bytes, MCInstrAnalysis *MIA,
!(STI->getTargetTriple().isPPC() && Target == Index))
Labels[Target] = ("L" + Twine(LabelCount++)).str();
MIA->updateState(Inst, Index);
+ } else if (!Disassembled && MIA) {
+ MIA->resetState();
}
Index += Size;
}
@@ -2166,6 +2168,8 @@ disassembleObject(ObjectFile &Obj, const ObjectFile &DbgObj,
}
DT->InstrAnalysis->updateState(Inst, SectionAddr + Index);
+ } else if (!Disassembled && DT->InstrAnalysis) {
+ DT->InstrAnalysis->resetState();
}
}
>From 0c03d56c38f63260dbbdeeb821d188b33bb0cf6d Mon Sep 17 00:00:00 2001
From: Job Noorman <jnoorman at igalia.com>
Date: Wed, 4 Oct 2023 08:32:52 +0200
Subject: [PATCH 5/5] fixup! [RISCV][MC] Implement evaluateBranch for
auipc+jalr pairs
Use array+bitset instead of array of optionals
---
.../RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp | 35 +++++++++++++------
1 file changed, 25 insertions(+), 10 deletions(-)
diff --git a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp
index 5349fa0a4b30881..79e56a7a6d03d77 100644
--- a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp
+++ b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp
@@ -31,6 +31,7 @@
#include "llvm/MC/MCSubtargetInfo.h"
#include "llvm/MC/TargetRegistry.h"
#include "llvm/Support/ErrorHandling.h"
+#include <bitset>
#define GET_INSTRINFO_MC_DESC
#define ENABLE_INSTR_PREDICATE_VERIFIER
@@ -114,34 +115,48 @@ static MCTargetStreamer *createRISCVNullTargetStreamer(MCStreamer &S) {
namespace {
class RISCVMCInstrAnalysis : public MCInstrAnalysis {
- std::optional<int64_t> GPRState[31];
+ int64_t GPRState[31] = {};
+ std::bitset<31> GPRValidMask;
static bool isGPR(unsigned Reg) {
return Reg >= RISCV::X0 && Reg <= RISCV::X31;
}
+ static unsigned getRegIndex(unsigned Reg) {
+ assert(isGPR(Reg) && Reg != RISCV::X0 && "Invalid GPR reg");
+ return Reg - RISCV::X1;
+ }
+
void setGPRState(unsigned Reg, std::optional<int64_t> Value) {
- assert(isGPR(Reg) && "Invalid GPR reg");
+ if (Reg == RISCV::X0)
+ return;
+
+ auto Index = getRegIndex(Reg);
- if (Reg != RISCV::X0)
- GPRState[Reg - RISCV::X1] = Value;
+ if (Value) {
+ GPRState[Index] = *Value;
+ GPRValidMask.set(Index);
+ } else {
+ GPRValidMask.reset(Index);
+ }
}
std::optional<int64_t> getGPRState(unsigned Reg) const {
- assert(isGPR(Reg) && "Invalid GPR reg");
-
if (Reg == RISCV::X0)
return 0;
- return GPRState[Reg - RISCV::X1];
+
+ auto Index = getRegIndex(Reg);
+
+ if (GPRValidMask.test(Index))
+ return GPRState[Index];
+ return std::nullopt;
}
public:
explicit RISCVMCInstrAnalysis(const MCInstrInfo *Info)
: MCInstrAnalysis(Info) {}
- void resetState() override {
- std::fill(std::begin(GPRState), std::end(GPRState), std::nullopt);
- }
+ void resetState() override { GPRValidMask.reset(); }
void updateState(const MCInst &Inst, uint64_t Addr) override {
// Terminators mark the end of a basic block which means the sequentially
More information about the llvm-commits
mailing list