[llvm] [BOLT][AArch64] Enabling Inlining for Memcpy for AArch64 in BOLT (PR #154929)
via llvm-commits
llvm-commits at lists.llvm.org
Fri Aug 22 09:02:40 PDT 2025
================
@@ -1866,8 +1866,32 @@ Error InlineMemcpy::runOnFunctions(BinaryContext &BC) {
const bool IsMemcpy8 = (CalleeSymbol->getName() == "_memcpy8");
const bool IsTailCall = BC.MIB->isTailCall(Inst);
+ // Extract the size of thecopy from preceding instructions by looking
+ // for writes to the size register
+ std::optional<uint64_t> KnownSize = std::nullopt;
+ BitVector WrittenRegs(BC.MRI->getNumRegs());
+
+ // Get the size register (3rd arg register, index 2 for AArch64)
----------------
yafet-a wrote:
The intention was the architecture-specific dispatching should happen at the MCPlusBuilder level, not the pass level. The `InlineMemcpy` pass was intended to be architecture-agnostic, with each architecture's `MCPlusBuilder` handling its own implementation details through virtual method overrides.
I added a new virtual method [`createInlineMemcpy(bool ReturnEnd, std::optional<uint64_t> KnownSize)` in MCPlusBuilder.h (lines 1898-1904)](https://github.com/yafet-a/llvm-project/blob/users/yafet-a/inlining-memcpy/bolt/include/bolt/Core/MCPlusBuilder.h#L1898-L1904) with a default fallback implementation:
```cpp
virtual InstructionListType createInlineMemcpy(bool ReturnEnd,
std::optional<uint64_t> KnownSize) const {
// Default implementation ignores KnownSize and uses original method
return createInlineMemcpy(ReturnEnd);
}
```
This meant that:
- **X86**: Uses the default fallback, and ignores the `KnownSize` parameter because `REP MOVSB` is a single instruction that reads the size from `RCX` at runtime. it doesn't need compile-time size knowledge and can continue working as it was untouched
- **AArch64**: [Overrides the method in AArch64MCPlusBuilder.cpp (lines 2620)](https://github.com/yafet-a/llvm-project/blob/users/yafet-a/inlining-memcpy/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp#L2620) to use the `KnownSize` for generating optimal width-specific load/store sequences
However, you make an good point because the size extraction logic I added in [`BinaryPasses.cpp (lines 1869-1890)`](https://github.com/yafet-a/llvm-project/blob/users/yafet-a/inlining-memcpy/bolt/lib/Passes/BinaryPasses.cpp#L1869-L1890) is indeed AArch64-specific:
```cpp
// Extract size from preceding instructions (AArch64 only)
// Pattern: MOV X2, #nb-bytes; BL memcpy src, dest, X2
if (BC.isAArch64()) {
MCPhysReg SizeReg = BC.MIB->getIntArgRegister(2); // X2 on AArch64
BC.MIB->extractMoveImmediate(Inst, SizeReg); // MOVZXi instruction
}
```
I've added an explicit early return for clarity, although technically the virtual method fallback handles X86 correctly anyway which is why the test. This makes the architecture-specific behavior **explicit and self-documenting**
https://github.com/llvm/llvm-project/pull/154929
More information about the llvm-commits
mailing list