[llvm] Users/paschalis mpeis/lite split functions cold layout (PR #96609)
Paschalis Mpeis via llvm-commits
llvm-commits at lists.llvm.org
Tue Jun 25 02:08:01 PDT 2024
https://github.com/paschalis-mpeis created https://github.com/llvm/llvm-project/pull/96609
[AArch64][BOLT] Ensure tentative code layout for cold BBs runs.
When BOLT is not processing all functions (e.g., when lite mode is
enabled or when multiple functions are ignored via the 'skip-funcs' flag),
we might skip the tentative code layout estimation for cold basic blocks.
However, due to the `-split-functions`` flag, we still need to compute
PC-relative distances between hot and cold basic blocks.
In such cases, BOLT will use '0x0' as the first address of the basic
block, leading to incorrect estimations of the necessary PC-relative
addresses. Consequently, the relaxStub method expands all those branches
to the short-jump sequence unnecessarily.
Such an unnecessary expansion by `relaxStub` is from:
```armasm
b .Ltmp1234
```
to:
```armasm
adrp x16, .Ltmp1234
add x16, x16, :lo12:.Ltmp1234
br x16
```
>From d98c6b973fbde67274584620172d91a6233110c1 Mon Sep 17 00:00:00 2001
From: Paschalis Mpeis <Paschalis.Mpeis at arm.com>
Date: Sun, 23 Jun 2024 19:36:32 +0100
Subject: [PATCH 1/2] BOLT may never run the tentative code layout for cold
BBs.
However, it might later require the addresses of cold BBs to compute
PC-relative jumps between hot and cold fragments of a function. Since
the addresses were never estimated, it uses 0x0 for cold BBs, resulting
in incorrect estimated addresses for any instructions within them.
As a result, LongJump often expands a branch that would otherwise fit in
a single instruction into a short jump.
For example, `relaxStub` might expand from:
```armasm
b .Ltmp1234
```
to:
```armasm
adrp x16, .Ltmp1234
add x16, x16, :lo12:.Ltmp1234
br x16
```
While this expansion is not wrong, it is unnecessary. Moreover, in some
large binaries this expansion has lead to runtime crashes. This should
be investigated further, as such expansions should not cause crashes.
---
bolt/lib/Passes/LongJmp.cpp | 5 +++++
bolt/test/AArch64/split-funcs-lite.test | 14 ++++++++++++++
2 files changed, 19 insertions(+)
create mode 100644 bolt/test/AArch64/split-funcs-lite.test
diff --git a/bolt/lib/Passes/LongJmp.cpp b/bolt/lib/Passes/LongJmp.cpp
index c483f70a836ee..c000b80fdb728 100644
--- a/bolt/lib/Passes/LongJmp.cpp
+++ b/bolt/lib/Passes/LongJmp.cpp
@@ -382,6 +382,11 @@ uint64_t LongJmpPass::tentativeLayoutRelocMode(
DotAddress += Func->estimateConstantIslandSize();
++CurrentIndex;
}
+ if (!ColdLayoutDone) {
+ errs() << "BOLT-ERROR: Did not perform tentative code layout for cold functions.\n";
+ exit(1);
+ }
+
// BBs
for (BinaryFunction *Func : SortedFunctions)
tentativeBBLayout(*Func);
diff --git a/bolt/test/AArch64/split-funcs-lite.test b/bolt/test/AArch64/split-funcs-lite.test
new file mode 100644
index 0000000000000..2f54c8f9444f9
--- /dev/null
+++ b/bolt/test/AArch64/split-funcs-lite.test
@@ -0,0 +1,14 @@
+// This test checks that tentative code layout for cold basic blocks runs
+// at least once, even when each function after the hot/cold frontier is not
+// emittable. This is done by ignoring each function using the 'skip-funcs' flag.
+// In a realistic scenario, this may happen when lite mode is enabled along
+// with a bolt profile.
+
+REQUIRES: system-linux
+
+RUN: %clang %cflags %p/../Inputs/asm_main.c -Wl,-q -o %t
+
+RUN: not llvm-bolt %t -o %t.bolt -lite=1 -split-functions -split-all-cold \
+RUN: --skip-funcs="_init,_start,call_weak_fn/1,deregister_tm_clones/1,register_tm_clones/1,__do_global_dtors_aux/1,frame_dummy/1,main,foo,_fini" 2>&1 | FileCheck %s
+
+CHECK: BOLT-ERROR: Did not perform tentative code layout for cold functions.
\ No newline at end of file
>From 7cd7950ee0825e1187c4d8b7d20aeef647809039 Mon Sep 17 00:00:00 2001
From: Paschalis Mpeis <Paschalis.Mpeis at arm.com>
Date: Mon, 24 Jun 2024 17:43:06 +0100
Subject: [PATCH 2/2] [AArch64][BOLT] Ensure tentative code layout for cold BBs
runs.
When BOLT is not processing all functions (e.g., when lite mode is
enabled or when multiple functions are ignored via the 'skip-funcs' flag),
we might skip the tentative code layout estimation for cold basic blocks.
However, due to the `-split-functions`` flag, we still need to compute
PC-relative distances between hot and cold basic blocks.
In such cases, BOLT will use '0x0' as the first address of the basic
block, leading to incorrect estimations of the necessary PC-relative
addresses. Consequently, the relaxStub method expands all those branches
to the short-jump sequence unnecessarily.
Such an unnecessary expansion by `relaxStub` is from:
```armasm
b .Ltmp1234
```
to:
```armasm
adrp x16, .Ltmp1234
add x16, x16, :lo12:.Ltmp1234
br x16
```
---
bolt/lib/Passes/LongJmp.cpp | 9 ++++++---
bolt/test/AArch64/split-funcs-lite.test | 4 ++--
2 files changed, 8 insertions(+), 5 deletions(-)
diff --git a/bolt/lib/Passes/LongJmp.cpp b/bolt/lib/Passes/LongJmp.cpp
index c000b80fdb728..053650f8da16c 100644
--- a/bolt/lib/Passes/LongJmp.cpp
+++ b/bolt/lib/Passes/LongJmp.cpp
@@ -11,6 +11,7 @@
//===----------------------------------------------------------------------===//
#include "bolt/Passes/LongJmp.h"
+#include "bolt/Utils/CommandLineOpts.h"
#define DEBUG_TYPE "longjmp"
@@ -324,7 +325,6 @@ uint64_t LongJmpPass::tentativeLayoutRelocColdPart(
uint64_t LongJmpPass::tentativeLayoutRelocMode(
const BinaryContext &BC, std::vector<BinaryFunction *> &SortedFunctions,
uint64_t DotAddress) {
-
// Compute hot cold frontier
uint32_t LastHotIndex = -1u;
uint32_t CurrentIndex = 0;
@@ -354,9 +354,12 @@ uint64_t LongJmpPass::tentativeLayoutRelocMode(
for (BinaryFunction *Func : SortedFunctions) {
if (!BC.shouldEmit(*Func)) {
HotAddresses[Func] = Func->getAddress();
- continue;
+ // Don't perform any tentative address estimation of a function's cold
+ // layout if it won't be emitted, unless we are ignoring a large number of
+ // functions (ie, on lite mode) and we haven't done such estimation yet.
+ if (opts::processAllFunctions() || ColdLayoutDone)
+ continue;
}
-
if (!ColdLayoutDone && CurrentIndex >= LastHotIndex) {
DotAddress =
tentativeLayoutRelocColdPart(BC, SortedFunctions, DotAddress);
diff --git a/bolt/test/AArch64/split-funcs-lite.test b/bolt/test/AArch64/split-funcs-lite.test
index 2f54c8f9444f9..c7a11918fe061 100644
--- a/bolt/test/AArch64/split-funcs-lite.test
+++ b/bolt/test/AArch64/split-funcs-lite.test
@@ -8,7 +8,7 @@ REQUIRES: system-linux
RUN: %clang %cflags %p/../Inputs/asm_main.c -Wl,-q -o %t
-RUN: not llvm-bolt %t -o %t.bolt -lite=1 -split-functions -split-all-cold \
+RUN: llvm-bolt %t -o %t.bolt -lite=1 -split-functions -split-all-cold \
RUN: --skip-funcs="_init,_start,call_weak_fn/1,deregister_tm_clones/1,register_tm_clones/1,__do_global_dtors_aux/1,frame_dummy/1,main,foo,_fini" 2>&1 | FileCheck %s
-CHECK: BOLT-ERROR: Did not perform tentative code layout for cold functions.
\ No newline at end of file
+CHECK-NOT: BOLT-ERROR: Did not perform tentative code layout for cold functions.
\ No newline at end of file
More information about the llvm-commits
mailing list