[llvm] [BOLT][AArch64] Enabling Inlining for Memcpy for AArch64 in BOLT (PR #154929)
via llvm-commits
llvm-commits at lists.llvm.org
Fri Sep 5 09:02:12 PDT 2025
yafet-a wrote:
> Did you check whether some of the inlined memcpy's were actually executed? I want to avoid that we have missed some corner cases.
Yes, I checked on the zstd smoke tests and inspected the objdump. Found our inlined memcpy patterns at multiple addresses:
```assembly
178bc: ldr q0, [x1]
178c0: str q0, [x0]
40e4c: ldr q0, [x1]
40e50: str q0, [x0]
46e10: ldr q0, [x1]
46e14: str q0, [x0]
```
I then confirmed execution by using perf tools on the `ZSTD_compressBlock_fast` function and saw some of these instances actively running e.g.:
```assembly
0.33 : 559bc: ldr q0, [x1] <- 0.33% CPU time
| 559c0: str q0, [x0] <- actively executing
```
So for this 16-byte copy example it used our inlining change that replaced the original memcpy calls.
<img width="567" height="146" alt="image" src="https://github.com/user-attachments/assets/878545ad-8393-476c-b693-f8f62847184a" />
https://github.com/llvm/llvm-project/pull/154929
More information about the llvm-commits
mailing list