[llvm] [BOLT][Linux] Support ORC for alternative instructions (PR #96709)

via llvm-commits llvm-commits at lists.llvm.org
Tue Jun 25 15:56:35 PDT 2024


llvmbot wrote:


<!--LLVM PR SUMMARY COMMENT-->

@llvm/pr-subscribers-bolt

Author: Maksim Panchenko (maksfb)

<details>
<summary>Changes</summary>

Alternative instruction sequences in the Linux kernel can modify the stack and thus they need their own ORC unwind entries. Since there's only one ORC table, it has to be "shared" among multiple instruction sequences. The kernel achieves this by putting a restriction on instruction boundaries. If ORC state changes at a given IP, only one of the alternative sequences can have an instruction starting/ending at this IP. Then, developers can insert NOPs to guarantee the above requirement is met.

The most common use of ORC with alternatives is "pushf; pop %rax" sequence used for paravirtualization.

Before we implement a better support for alternatives, we can safely skip ORC entries associated with them.

Fixes #<!-- -->87052.

---
Full diff: https://github.com/llvm/llvm-project/pull/96709.diff


3 Files Affected:

- (modified) bolt/include/bolt/Core/BinaryFunction.h (+4) 
- (modified) bolt/lib/Core/BinaryFunction.cpp (+12) 
- (modified) bolt/lib/Rewrite/LinuxKernelRewriter.cpp (+23-6) 


``````````diff
diff --git a/bolt/include/bolt/Core/BinaryFunction.h b/bolt/include/bolt/Core/BinaryFunction.h
index 3c641581e247a..1048e50cd8058 100644
--- a/bolt/include/bolt/Core/BinaryFunction.h
+++ b/bolt/include/bolt/Core/BinaryFunction.h
@@ -930,6 +930,10 @@ class BinaryFunction {
     return const_cast<BinaryFunction *>(this)->getInstructionAtOffset(Offset);
   }
 
+  /// When the function is in disassembled state, return an instruction that
+  /// contains the \p Offset.
+  MCInst *getInstructionContainingOffset(uint64_t Offset);
+
   std::optional<MCInst> disassembleInstructionAtOffset(uint64_t Offset) const;
 
   /// Return offset for the first instruction. If there is data at the
diff --git a/bolt/lib/Core/BinaryFunction.cpp b/bolt/lib/Core/BinaryFunction.cpp
index c608ff40c6d9c..3b5a93b39ea17 100644
--- a/bolt/lib/Core/BinaryFunction.cpp
+++ b/bolt/lib/Core/BinaryFunction.cpp
@@ -4472,6 +4472,18 @@ MCInst *BinaryFunction::getInstructionAtOffset(uint64_t Offset) {
   }
 }
 
+MCInst *BinaryFunction::getInstructionContainingOffset(uint64_t Offset) {
+  assert(CurrentState == State::Disassembled && "Wrong function state");
+
+  if (Offset > Size)
+    return nullptr;
+
+  auto II = Instructions.upper_bound(Offset);
+  assert(II != Instructions.begin() && "First instruction not at offset 0");
+  --II;
+  return &II->second;
+}
+
 void BinaryFunction::printLoopInfo(raw_ostream &OS) const {
   if (!opts::shouldPrint(*this))
     return;
diff --git a/bolt/lib/Rewrite/LinuxKernelRewriter.cpp b/bolt/lib/Rewrite/LinuxKernelRewriter.cpp
index aa18e79f73b9d..fa5be036c7920 100644
--- a/bolt/lib/Rewrite/LinuxKernelRewriter.cpp
+++ b/bolt/lib/Rewrite/LinuxKernelRewriter.cpp
@@ -295,9 +295,6 @@ class LinuxKernelRewriter final : public MetadataRewriter {
     if (Error E = processSMPLocks())
       return E;
 
-    if (Error E = readORCTables())
-      return E;
-
     if (Error E = readStaticCalls())
       return E;
 
@@ -313,6 +310,9 @@ class LinuxKernelRewriter final : public MetadataRewriter {
     if (Error E = readAltInstructions())
       return E;
 
+    if (Error E = readORCTables())
+      return E;
+
     if (Error E = readPCIFixupTable())
       return E;
 
@@ -563,11 +563,28 @@ Error LinuxKernelRewriter::readORCTables() {
     if (!BF->hasInstructions())
       continue;
 
-    MCInst *Inst = BF->getInstructionAtOffset(IP - BF->getAddress());
-    if (!Inst)
+    const uint64_t Offset = IP - BF->getAddress();
+    MCInst *Inst = BF->getInstructionAtOffset(Offset);
+    if (!Inst) {
+      // Check if there is an alternative instruction(s) at this IP. Multiple
+      // alternative instructions can take a place of a single original
+      // instruction and each alternative can have a separate ORC entry.
+      // Since ORC table is shared between all alternative sequences, there's
+      // a requirement that only one (out of many) sequences can have an
+      // instruction from the ORC table to avoid ambiguities/conflicts.
+      //
+      // For now, we have limited support for alternatives. I.e. we still print
+      // functions with them, but will not change the code in the output binary.
+      // As such, we can ignore alternative ORC entries. They will be preserved
+      // in the binary, but will not get printed in the instruction stream.
+      Inst = BF->getInstructionContainingOffset(Offset);
+      if (Inst || BC.MIB->hasAnnotation(*Inst, "AltInst"))
+        continue;
+
       return createStringError(
           errc::executable_format_error,
           "no instruction at address 0x%" PRIx64 " in .orc_unwind_ip", IP);
+    }
 
     // Some addresses will have two entries associated with them. The first
     // one being a "weak" section terminator. Since we ignore the terminator,
@@ -1440,7 +1457,7 @@ Error LinuxKernelRewriter::tryReadAltInstructions(uint32_t AltInstFeatureSize,
       AltBF->setIgnored();
     }
 
-    if (!BF || !BC.shouldEmit(*BF))
+    if (!BF || !BF->hasInstructions())
       continue;
 
     if (OrgInstAddress + OrgSize > BF->getAddress() + BF->getSize())

``````````

</details>


https://github.com/llvm/llvm-project/pull/96709


More information about the llvm-commits mailing list