[llvm] [BOLT][RISCV] Implement basic instrumentation (PR #71664)

Job Noorman via llvm-commits llvm-commits at lists.llvm.org
Wed Nov 8 03:47:50 PST 2023


https://github.com/mtvec created https://github.com/llvm/llvm-project/pull/71664

This patch implements support for instrumenting direct function calls and jumps on RISC-V (anything that relies on `createInstrIncMemory`) as well as the necessary runtime library support (init, fini, and syscall wrappers).

Only `createInstrIncMemory` is implemented by this patch. However, most of the other `create*` functions are called when instrumenting  even the simplest binary on RISC-V and their default implementation triggers an assert. Therefore, dummy implementations are added by this patch.

Atomic instructions (used to increment counters) are only available on RISC-V when the A extension is used. A check was added to err when trying to instrument a binary that was compiled without this extension enabled.

The syscall wrappers in the runtime library have been implemented by leveraging the fact that the syscall calling convention matches the user space one (at least when up to six scalars are passed as arguments): for every wrapper, a forward declaration is defined in C whereas its implementation is defined in assembly and simply loads the syscall number and uses `ecall`. This way, the compiler will prepare all the arguments in the correct registers when calling the function, saving us from a lot of repetitive work. The implementation is done in global assembly (instead of inline assembly in a C function) to make sure the compiler cannot optimize away the call and skip preparing the argument registers.

I've taken a slightly different approach than the existing targets with respect to the data types used by syscalls. I'm not quite sure why `common.h` defines some system types but I believe it is fine to just use the system headers for this (we only want to avoid linking against system libraries; using their headers should not cause problems). For this reason, I've made sure the definitions in `common.h` are not used for RISC-V.

>From 8ef008f63b231e120beb532cc1ff2783976b4cb4 Mon Sep 17 00:00:00 2001
From: Job Noorman <jnoorman at igalia.com>
Date: Thu, 12 Oct 2023 10:58:58 +0200
Subject: [PATCH] [BOLT][RISCV] Implement basic instrumentation

This patch implements support for instrumenting direct function calls
and jumps on RISC-V (anything that relies on `createInstrIncMemory`) as
well as the necessary runtime library support (init, fini, and syscall
wrappers).

Only `createInstrIncMemory` is implemented by this patch. However, most
of the other `create*` functions are called when instrumenting  even the
simplest binary on RISC-V and their default implementation triggers an
assert. Therefore, dummy implementations are added by this patch.

Atomic instructions (used to increment counters) are only available on
RISC-V when the A extension is used. A check was added to err when
trying to instrument a binary that was compiled without this extension
enabled.

The syscall wrappers in the runtime library have been implemented by
leveraging the fact that the syscall calling convention matches the user
space one (at least when up to six scalars are passed as arguments): for
every wrapper, a forward declaration is defined in C whereas its
implementation is defined in assembly and simply loads the syscall
number and uses `ecall`. This way, the compiler will prepare all the
arguments in the correct registers when calling the function, saving us
from a lot of repetitive work. The implementation is done in global
assembly (instead of inline assembly in a C function) to make sure the
compiler cannot optimize away the call and skip preparing the argument
registers.

I've taken a slightly different approach than the existing targets with
respect to the data types used by syscalls. I'm not quite sure why
`common.h` defines some system types but I believe it is fine to just
use the system headers for this (we only want to avoid linking against
system libraries; using their headers should not cause problems). For
this reason, I've made sure the definitions in `common.h` are not used
for RISC-V.
---
 bolt/CMakeLists.txt                           |   3 +-
 bolt/lib/Target/RISCV/CMakeLists.txt          |   2 +-
 bolt/lib/Target/RISCV/RISCVMCPlusBuilder.cpp  | 166 ++++++++++++++++++
 bolt/runtime/common.h                         |   4 +
 bolt/runtime/instr.cpp                        |  20 +++
 bolt/runtime/sys_riscv64.h                    |  94 ++++++++++
 .../runtime/RISCV/basic-instrumentation.s     |  33 ++++
 bolt/test/runtime/RISCV/lit.local.cfg         |   2 +
 8 files changed, 322 insertions(+), 2 deletions(-)
 create mode 100644 bolt/runtime/sys_riscv64.h
 create mode 100644 bolt/test/runtime/RISCV/basic-instrumentation.s
 create mode 100644 bolt/test/runtime/RISCV/lit.local.cfg

diff --git a/bolt/CMakeLists.txt b/bolt/CMakeLists.txt
index f163d45342874d0..350a414b21c9c0c 100644
--- a/bolt/CMakeLists.txt
+++ b/bolt/CMakeLists.txt
@@ -33,7 +33,8 @@ endforeach()
 
 set(BOLT_ENABLE_RUNTIME_default OFF)
 if ((CMAKE_SYSTEM_PROCESSOR STREQUAL "x86_64"
-    OR CMAKE_SYSTEM_PROCESSOR STREQUAL "aarch64")
+    OR CMAKE_SYSTEM_PROCESSOR STREQUAL "aarch64"
+    OR CMAKE_SYSTEM_PROCESSOR STREQUAL "riscv64")
     AND (CMAKE_SYSTEM_NAME STREQUAL "Linux"
       OR CMAKE_SYSTEM_NAME STREQUAL "Darwin")
     AND (NOT CMAKE_CROSSCOMPILING))
diff --git a/bolt/lib/Target/RISCV/CMakeLists.txt b/bolt/lib/Target/RISCV/CMakeLists.txt
index 7f9557606320088..8c6f6f79e7401ac 100644
--- a/bolt/lib/Target/RISCV/CMakeLists.txt
+++ b/bolt/lib/Target/RISCV/CMakeLists.txt
@@ -13,7 +13,7 @@ add_llvm_library(LLVMBOLTTargetRISCV
   RISCVCommonTableGen
   )
 
-target_link_libraries(LLVMBOLTTargetRISCV PRIVATE LLVMBOLTCore)
+target_link_libraries(LLVMBOLTTargetRISCV PRIVATE LLVMBOLTCore LLVMBOLTUtils)
 
 include_directories(
   ${LLVM_MAIN_SRC_DIR}/lib/Target/RISCV
diff --git a/bolt/lib/Target/RISCV/RISCVMCPlusBuilder.cpp b/bolt/lib/Target/RISCV/RISCVMCPlusBuilder.cpp
index e19c070b081733c..cb3a805c254ffb3 100644
--- a/bolt/lib/Target/RISCV/RISCVMCPlusBuilder.cpp
+++ b/bolt/lib/Target/RISCV/RISCVMCPlusBuilder.cpp
@@ -13,8 +13,11 @@
 #include "MCTargetDesc/RISCVMCExpr.h"
 #include "MCTargetDesc/RISCVMCTargetDesc.h"
 #include "bolt/Core/MCPlusBuilder.h"
+#include "bolt/Utils/CommandLineOpts.h"
 #include "llvm/BinaryFormat/ELF.h"
+#include "llvm/MC/MCContext.h"
 #include "llvm/MC/MCInst.h"
+#include "llvm/MC/MCInstBuilder.h"
 #include "llvm/MC/MCSubtargetInfo.h"
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/ErrorHandling.h"
@@ -30,6 +33,7 @@ namespace {
 
 class RISCVMCPlusBuilder : public MCPlusBuilder {
 public:
+  using MCPlusBuilder::createLoad;
   using MCPlusBuilder::MCPlusBuilder;
 
   bool equals(const MCTargetExpr &A, const MCTargetExpr &B,
@@ -320,6 +324,8 @@ class RISCVMCPlusBuilder : public MCPlusBuilder {
     default:
       return false;
     case RISCV::C_J:
+    case RISCV::PseudoCALL:
+    case RISCV::PseudoTAIL:
       OpNum = 0;
       return true;
     case RISCV::AUIPC:
@@ -497,6 +503,160 @@ class RISCVMCPlusBuilder : public MCPlusBuilder {
       return 2;
     return 4;
   }
+
+  InstructionListType
+  createInstrIncMemory(const MCSymbol *Target, MCContext *Ctx, bool IsLeaf,
+                       unsigned CodePointerSize) const override {
+    // We need 2 scratch registers: one for the target address (t0/x5), and one
+    // for the increment value (t1/x6).
+    // addi sp, sp, -16
+    // sd t0, 0(sp)
+    // sd t1, 8(sp)
+    // la t0, target         # 1: auipc t0, %pcrel_hi(target)
+    //                       # addi t0, t0, %pcrel_lo(1b)
+    // li t1, 1              # addi t1, zero, 1
+    // amoadd.d zero, t0, t1
+    // ld t0, 0(sp)
+    // ld t1, 8(sp)
+    // addi sp, sp, 16
+    InstructionListType Insts;
+    spillRegs(Insts, {RISCV::X5, RISCV::X6});
+
+    createLA(Insts, RISCV::X5, Target, *Ctx);
+
+    MCInst LI = MCInstBuilder(RISCV::ADDI)
+                    .addReg(RISCV::X6)
+                    .addReg(RISCV::X0)
+                    .addImm(1);
+    Insts.push_back(LI);
+
+    MCInst AMOADD = MCInstBuilder(RISCV::AMOADD_D)
+                        .addReg(RISCV::X0)
+                        .addReg(RISCV::X5)
+                        .addReg(RISCV::X6);
+    Insts.push_back(AMOADD);
+
+    reloadRegs(Insts, {RISCV::X5, RISCV::X6});
+    return Insts;
+  }
+
+  InstructionListType createInstrumentedIndirectCall(MCInst &&CallInst,
+                                                     MCSymbol *HandlerFuncAddr,
+                                                     int CallSiteID,
+                                                     MCContext *Ctx) override {
+    return {};
+  }
+
+  InstructionListType
+  createInstrumentedIndCallHandlerEntryBB(const MCSymbol *InstrTrampoline,
+                                          const MCSymbol *IndCallHandler,
+                                          MCContext *Ctx) override {
+    return {};
+  }
+
+  InstructionListType createInstrumentedIndCallHandlerExitBB() const override {
+    return {};
+  }
+
+  InstructionListType
+  createInstrumentedIndTailCallHandlerExitBB() const override {
+    return {};
+  }
+
+  InstructionListType createNumCountersGetter(MCContext *Ctx) const override {
+    return {};
+  }
+
+  InstructionListType
+  createInstrLocationsGetter(MCContext *Ctx) const override {
+    return {};
+  }
+
+  InstructionListType createInstrTablesGetter(MCContext *Ctx) const override {
+    return {};
+  }
+
+  InstructionListType createInstrNumFuncsGetter(MCContext *Ctx) const override {
+    return {};
+  }
+
+  InstructionListType createSymbolTrampoline(const MCSymbol *TgtSym,
+                                             MCContext *Ctx) override {
+    InstructionListType Insts;
+    createTailCall(Insts.emplace_back(), TgtSym, Ctx);
+    return Insts;
+  }
+
+  const RISCVMCExpr *createSymbolRefExpr(const MCSymbol *Target,
+                                         RISCVMCExpr::VariantKind VK,
+                                         MCContext &Ctx) const {
+    return RISCVMCExpr::create(MCSymbolRefExpr::create(Target, Ctx), VK, Ctx);
+  }
+
+  void createAuipcInstPair(InstructionListType &Insts, unsigned DestReg,
+                           const MCSymbol *Target, unsigned SecondOpcode,
+                           MCContext &Ctx) const {
+    MCInst AUIPC = MCInstBuilder(RISCV::AUIPC)
+                       .addReg(DestReg)
+                       .addExpr(createSymbolRefExpr(
+                           Target, RISCVMCExpr::VK_RISCV_PCREL_HI, Ctx));
+    MCSymbol *AUIPCLabel = Ctx.createNamedTempSymbol("pcrel_hi");
+    setLabel(AUIPC, AUIPCLabel);
+    Insts.push_back(AUIPC);
+
+    MCInst SecondInst =
+        MCInstBuilder(SecondOpcode)
+            .addReg(DestReg)
+            .addReg(DestReg)
+            .addExpr(createSymbolRefExpr(AUIPCLabel,
+                                         RISCVMCExpr::VK_RISCV_PCREL_LO, Ctx));
+    Insts.push_back(SecondInst);
+  }
+
+  void createLA(InstructionListType &Insts, unsigned DestReg,
+                const MCSymbol *Target, MCContext &Ctx) const {
+    createAuipcInstPair(Insts, DestReg, Target, RISCV::ADDI, Ctx);
+  }
+
+  void createRegInc(MCInst &Inst, unsigned Reg, int64_t Imm) const {
+    Inst = MCInstBuilder(RISCV::ADDI).addReg(Reg).addReg(Reg).addImm(Imm);
+  }
+
+  void createSPInc(MCInst &Inst, int64_t Imm) const {
+    createRegInc(Inst, RISCV::X2, Imm);
+  }
+
+  void createStore(MCInst &Inst, unsigned Reg, unsigned BaseReg,
+                   int64_t Offset) const {
+    Inst = MCInstBuilder(RISCV::SD).addReg(Reg).addReg(BaseReg).addImm(Offset);
+  }
+
+  void createLoad(MCInst &Inst, unsigned Reg, unsigned BaseReg,
+                  int64_t Offset) const {
+    Inst = MCInstBuilder(RISCV::LD).addReg(Reg).addReg(BaseReg).addImm(Offset);
+  }
+
+  void spillRegs(InstructionListType &Insts,
+                 const SmallVector<unsigned> &Regs) const {
+    createSPInc(Insts.emplace_back(), -Regs.size() * 8);
+
+    int64_t Offset = 0;
+    for (auto Reg : Regs) {
+      createStore(Insts.emplace_back(), Reg, RISCV::X2, Offset);
+      Offset += 8;
+    }
+  }
+
+  void reloadRegs(InstructionListType &Insts,
+                  const SmallVector<unsigned> &Regs) const {
+    int64_t Offset = 0;
+    for (auto Reg : Regs) {
+      createLoad(Insts.emplace_back(), Reg, RISCV::X2, Offset);
+      Offset += 8;
+    }
+
+    createSPInc(Insts.emplace_back(), Regs.size() * 8);
+  }
 };
 
 } // end anonymous namespace
@@ -508,6 +668,12 @@ MCPlusBuilder *createRISCVMCPlusBuilder(const MCInstrAnalysis *Analysis,
                                         const MCInstrInfo *Info,
                                         const MCRegisterInfo *RegInfo,
                                         const MCSubtargetInfo *STI) {
+  if (opts::Instrument && !STI->getFeatureBits()[RISCV::FeatureStdExtA]) {
+    errs() << "BOLT-ERROR: Instrumention on RISC-V requires the A extension "
+              "but it is not enabled for the input binary";
+    exit(1);
+  }
+
   return new RISCVMCPlusBuilder(Analysis, Info, RegInfo, STI);
 }
 
diff --git a/bolt/runtime/common.h b/bolt/runtime/common.h
index 9b9965bae524eb7..b9bc6dc44e736c3 100644
--- a/bolt/runtime/common.h
+++ b/bolt/runtime/common.h
@@ -123,6 +123,7 @@ int memcmp(const void *s1, const void *s2, size_t n) {
 // Anonymous namespace covering everything but our library entry point
 namespace {
 
+#ifndef __riscv
 struct dirent64 {
   uint64_t d_ino;          /* Inode number */
   int64_t d_off;           /* Offset to next linux_dirent */
@@ -150,9 +151,12 @@ struct timespec {
   uint64_t tv_sec;  /* seconds */
   uint64_t tv_nsec; /* nanoseconds */
 };
+#endif
 
 #if defined(__aarch64__)
 #include "sys_aarch64.h"
+#elif defined(__riscv)
+#include "sys_riscv64.h"
 #else
 #include "sys_x86_64.h"
 #endif
diff --git a/bolt/runtime/instr.cpp b/bolt/runtime/instr.cpp
index cfd113e805c500c..811fa1a6e986efd 100644
--- a/bolt/runtime/instr.cpp
+++ b/bolt/runtime/instr.cpp
@@ -1676,6 +1676,10 @@ extern "C" __attribute((naked)) void __bolt_instr_indirect_call()
                        "ret\n"
                        :::);
   // clang-format on
+#elif defined(__riscv)
+  // clang-format off
+  __asm__ __volatile__("ebreak");
+  // clang-format on
 #else
   // clang-format off
   __asm__ __volatile__(SAVE_ALL
@@ -1700,6 +1704,10 @@ extern "C" __attribute((naked)) void __bolt_instr_indirect_tailcall()
                        "ret\n"
                        :::);
   // clang-format on
+#elif defined(__riscv)
+  // clang-format off
+  __asm__ __volatile__("ebreak");
+  // clang-format on
 #else
   // clang-format off
   __asm__ __volatile__(SAVE_ALL
@@ -1726,6 +1734,16 @@ extern "C" __attribute((naked)) void __bolt_instr_start()
                        "br x16\n"
                        :::);
   // clang-format on
+#elif defined(__riscv)
+  // clang-format off
+  // Only a0 (fini address) needs to be saved. Simply store it temporarily in a
+  // callee-saved register.
+  __asm__ __volatile__("mv s0, a0\n"
+                       "call __bolt_instr_setup\n"
+                       "mv a0, s0\n"
+                       "tail __bolt_start_trampoline\n"
+                       :::);
+  // clang-format on
 #else
   // clang-format off
   __asm__ __volatile__(SAVE_ALL
@@ -1748,6 +1766,8 @@ extern "C" void __bolt_instr_fini() {
                        RESTORE_ALL
                        :::);
   // clang-format on
+#elif defined(__riscv)
+  __bolt_fini_trampoline();
 #else
   __asm__ __volatile__("call __bolt_fini_trampoline\n" :::);
 #endif
diff --git a/bolt/runtime/sys_riscv64.h b/bolt/runtime/sys_riscv64.h
new file mode 100644
index 000000000000000..05c8cc31e52e345
--- /dev/null
+++ b/bolt/runtime/sys_riscv64.h
@@ -0,0 +1,94 @@
+#ifndef LLVM_TOOLS_LLVM_BOLT_SYS_RISCV64
+#define LLVM_TOOLS_LLVM_BOLT_SYS_RISCV64
+
+#include <dirent.h>
+#include <fcntl.h>
+#include <sched.h>
+#include <signal.h>
+#include <stdint.h>
+#include <sys/syscall.h>
+#include <sys/types.h>
+#include <time.h>
+
+#define STRINGIFY(X) #X
+#define XSTRINGIFY(X) STRINGIFY(X)
+
+// Syscalls are wrapped as follows: we declare a C function with the same type
+// as the syscall. This function is then implemented in global assembly by
+// simply loading the syscall number and calling ecall. This way, the compiler
+// will correctly prepare all argument registers when calling the wrapper. This
+// works because the syscall calling convention matches the user space one.
+// clang-format off
+#define WRAP_SYSCALL(Name, ReturnType, ...) \
+  ReturnType __##Name(__VA_ARGS__); \
+  asm( \
+    XSTRINGIFY(__##Name) ":\n\t" \
+    "li a7, " XSTRINGIFY(SYS_##Name) "\n\t" \
+    "ecall\n\t" \
+    "ret\n\t" \
+  )
+// clang-format on
+
+extern "C" {
+
+WRAP_SYSCALL(read, ssize_t, int, void *, size_t);
+WRAP_SYSCALL(write, ssize_t, int, const void *, size_t);
+WRAP_SYSCALL(mmap, void *, size_t, size_t, int, int, int, off_t);
+WRAP_SYSCALL(munmap, int, void *, size_t);
+WRAP_SYSCALL(exit, void, int);
+WRAP_SYSCALL(openat, int, int, const char *, int, mode_t);
+WRAP_SYSCALL(readlinkat, ssize_t, int, const char *, char *, size_t);
+WRAP_SYSCALL(getdents64, long, unsigned int, struct dirent64 *, size_t);
+WRAP_SYSCALL(lseek, off_t, int, off_t, int);
+WRAP_SYSCALL(fsync, int, int);
+WRAP_SYSCALL(ftruncate, int, int, off_t);
+WRAP_SYSCALL(close, int, int);
+WRAP_SYSCALL(setpgid, int, pid_t, pid_t);
+WRAP_SYSCALL(getpgid, pid_t, pid_t);
+WRAP_SYSCALL(getpid, pid_t, );
+WRAP_SYSCALL(getppid, pid_t, );
+WRAP_SYSCALL(nanosleep, int, struct timespec *, struct timespec *);
+WRAP_SYSCALL(rt_sigprocmask, int, int, const void *, void *, size_t);
+WRAP_SYSCALL(kill, int, pid_t, int);
+WRAP_SYSCALL(clone, long, unsigned long, unsigned long, int *, unsigned long,
+             int *);
+
+pid_t __fork() {
+  return __clone(CLONE_CHILD_CLEARTID | CLONE_CHILD_SETTID | SIGCHLD, 0, 0, 0,
+                 0);
+}
+
+int __open(const char *pathname, int flags, mode_t mode) {
+  return __openat(AT_FDCWD, pathname, flags, mode);
+}
+
+ssize_t __readlink(const char *pathname, char *buf, size_t bufsize) {
+  return __readlinkat(AT_FDCWD, pathname, buf, bufsize);
+}
+
+int __sigprocmask(int how, const void *set, void *oldset) {
+  return __rt_sigprocmask(how, set, oldset, 8);
+}
+}
+
+// Anonymous namespace covering everything but our library entry point
+namespace {
+
+// Get the difference between runtime address of .text section and static
+// address in section header table. Can be extracted from arbitrary pc value
+// recorded at runtime to get the corresponding static address, which in turn
+// can be used to search for indirect call description. Needed because indirect
+// call descriptions are read-only non-relocatable data.
+uint64_t getTextBaseAddress() {
+  uint64_t DynAddr;
+  uint64_t StaticAddr;
+  __asm__ volatile("lla %0, __hot_end\n\t"
+                   "lui %1, %%hi(__hot_end)\n\t"
+                   "addi %1, %1, %%lo(__hot_end)\n\t"
+                   : "=r"(DynAddr), "=r"(StaticAddr));
+  return DynAddr - StaticAddr;
+}
+
+} // anonymous namespace
+
+#endif
diff --git a/bolt/test/runtime/RISCV/basic-instrumentation.s b/bolt/test/runtime/RISCV/basic-instrumentation.s
new file mode 100644
index 000000000000000..e926f98cef43b0e
--- /dev/null
+++ b/bolt/test/runtime/RISCV/basic-instrumentation.s
@@ -0,0 +1,33 @@
+# REQUIRES: system-linux,bolt-runtime
+
+# RUN: %clang %cflags -Wl,-q -o %t.exe %s
+# RUN: llvm-bolt --instrument --instrumentation-file=%t.fdata -o %t.instr %t.exe
+
+## Run the profiled binary and check that the profile reports at least that `f`
+## has been called.
+# RUN: rm -f %t.fdata
+# RUN: %t.instr
+# RUN: cat %t.fdata | FileCheck %s
+# CHECK: f 0 0 1{{$}}
+
+## Check BOLT works with this profile
+# RUN: llvm-bolt --data %t.fdata --reorder-blocks=cache -o %t.bolt %t.exe
+
+    .text
+    .globl main
+    .type main, @function
+main:
+    addi sp, sp, -8
+    sd ra, 0(sp)
+    call f
+    ld ra, 0(sp)
+    addi sp, sp, 8
+    li a0, 0
+    ret
+    .size main, .-main
+
+    .globl f
+    .type f, @function
+f:
+    ret
+    .size f, .-f
diff --git a/bolt/test/runtime/RISCV/lit.local.cfg b/bolt/test/runtime/RISCV/lit.local.cfg
new file mode 100644
index 000000000000000..c0627d905ab3b27
--- /dev/null
+++ b/bolt/test/runtime/RISCV/lit.local.cfg
@@ -0,0 +1,2 @@
+if config.host_arch not in ["riscv64"]:
+    config.unsupported = True



More information about the llvm-commits mailing list