[llvm] [BOLT][AArch64] Enabling Inlining for Memcpy for AArch64 in BOLT (PR #154929)

via llvm-commits llvm-commits at lists.llvm.org
Fri Sep 5 09:02:12 PDT 2025


yafet-a wrote:

> Did you check whether some of the inlined memcpy's were actually executed? I want to avoid that we have missed some corner cases.

Yes, I checked on the zstd smoke tests and inspected the objdump. Found our inlined memcpy patterns at multiple addresses:

```assembly
178bc: ldr q0, [x1]
178c0: str q0, [x0]

40e4c: ldr q0, [x1]
40e50: str q0, [x0]

46e10: ldr q0, [x1]
46e14: str q0, [x0]
```

I then confirmed execution by using perf tools on the `ZSTD_compressBlock_fast` function and saw some of these instances actively running e.g.:

```assembly
0.33 :   559bc:  ldr     q0, [x1]    <- 0.33% CPU time 
   |         559c0:  str     q0, [x0]    <- actively executing
```

So for this 16-byte copy example it used our inlining change that replaced the original memcpy calls.
<img width="567" height="146" alt="image" src="https://github.com/user-attachments/assets/878545ad-8393-476c-b693-f8f62847184a" />


https://github.com/llvm/llvm-project/pull/154929


More information about the llvm-commits mailing list