[all-commits] [llvm/llvm-project] ff5e2b: [BOLT] Improve handling of relocations targeting s...
Job Noorman via All-commits
all-commits at lists.llvm.org
Thu Oct 5 23:46:29 PDT 2023
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: ff5e2babcb46e7eb3887ee265decb2948da2792c
https://github.com/llvm/llvm-project/commit/ff5e2babcb46e7eb3887ee265decb2948da2792c
Author: Job Noorman <jnoorman at igalia.com>
Date: 2023-10-06 (Fri, 06 Oct 2023)
Changed paths:
M bolt/include/bolt/Core/MCPlus.h
M bolt/include/bolt/Core/MCPlusBuilder.h
M bolt/include/bolt/Core/Relocation.h
M bolt/lib/Core/BinaryContext.cpp
M bolt/lib/Core/BinaryEmitter.cpp
M bolt/lib/Core/BinaryFunction.cpp
M bolt/lib/Core/MCPlusBuilder.cpp
M bolt/lib/Core/Relocation.cpp
M bolt/lib/Passes/BinaryPasses.cpp
M bolt/lib/Rewrite/RewriteInstance.cpp
M bolt/test/RISCV/reloc-abs.s
A bolt/test/RISCV/reloc-bb-split.s
M bolt/test/RISCV/reloc-got.s
M bolt/test/RISCV/reloc-pcrel.s
Log Message:
-----------
[BOLT] Improve handling of relocations targeting specific instructions (#66395)
On RISC-V, there are certain relocations that target a specific
instruction instead of a more abstract location like a function or basic
block. Take the following example that loads a value from symbol `foo`:
```
nop
1: auipc t0, %pcrel_hi(foo)
ld t0, %pcrel_lo(1b)(t0)
```
This results in two relocation:
- auipc: `R_RISCV_PCREL_HI20` referencing `foo`;
- ld: `R_RISCV_PCREL_LO12_I` referencing to local label `1` which points
to the auipc instruction.
It is of utmost importance that the `R_RISCV_PCREL_LO12_I` keeps
referring to the auipc instruction; if not, the program will fail to
assemble. However, BOLT currently does not guarantee this.
BOLT currently assumes that all local symbols are jump targets and
always starts a new basic block at symbol locations. The example above
results in a CFG the looks like this:
```
.BB0:
nop
.BB1:
auipc t0, %pcrel_hi(foo)
ld t0, %pcrel_lo(.BB1)(t0)
```
While this currently works (i.e., the `R_RISCV_PCREL_LO12_I` relocation
points to the correct instruction), it has two downsides:
- Too many basic blocks are created (the example above is logically only
one yet two are created);
- If instructions are inserted in `.BB1` (e.g., by instrumentation),
things will break since the label will not point to the auipc anymore.
This patch proposes to fix this issue by teaching BOLT to track labels
that should always point to a specific instruction. This is implemented
as follows:
- Add a new annotation type (`kLabel`) that allows us to annotate
instructions with an `MCSymbol *`;
- Whenever we encounter a relocation type that is used to refer to a
specific instruction (`Relocation::isInstructionReference`), we
register it without a symbol;
- During disassembly, whenever we encounter an instruction with such a
relocation, create a symbol for its target and store it in an offset
to symbol map (to ensure multiple relocations referencing the same
instruction use the same label);
- After disassembly, iterate this map to attach labels to instructions
via the new annotation type;
- During emission, emit these labels right before the instruction.
I believe the use of annotations works quite well for this use case as
it allows us to reliably track instruction labels. If we were to store
them as offsets in basic blocks, it would be error prone to keep them
updated whenever instructions are inserted or removed.
I have chosen to add labels as first-class annotations (as opposed to a
generic one) because the documentation of `MCAnnotation` suggests that
generic annotations are to be used for optional metadata that can be
discarded without affecting correctness. As this is not the case for
labels, a first-class annotation seemed more appropriate.
More information about the All-commits
mailing list