[llvm] [BOLT][AArch64] Fixed indirect call instrumentation snippet (PR #141918)
YongKang Zhu via llvm-commits
llvm-commits at lists.llvm.org
Wed Nov 5 10:51:43 PST 2025
yozhu wrote:
@yavtuk Thanks for the patch!
We have some similar internal changes on rewriting indirect call instrumentation code sequence. I compared the changes and would like to check with you whether the differences are correct:
> Instrumented indirect call:
> stp x0, x1, [sp, #-16]!
> mov x0, x8
> movk x1, #0x0, lsl #48
> movk x1, #0x0, lsl #32
> movk x1, #0x0, lsl #16
> movk x1, #0x0
> stp x0, x0, [sp, #-16]!
> adrp x8, __bolt_instr_ind_call_handler_func
> add x8, x8, #:lo12:__bolt_instr_ind_call_handler_func
> str x30, [sp, #-16]!
> blr x8 <--- call trampoline instr lib
> ldr x30, [sp], #16
> [x] ldp x0, x1, [sp], #16
> [x] mov x8, x0 <---- restore original target
> ldp x0, x1, [sp], #16
> blr x8 <--- original indirect call instruction
There is a typo in the `stp` instruction immediately after the four `movk` instructions - it should be `stp x0, x1, [sp, #-16]!`, right?
For the call to trampoline sequence, we can also just use `x0` to hold the trampoline address. It doesn't need to be `x8`, the original register used for indirect call. But this doesn't impact functionality or code sequence length.
The two instructions marked with `[x]` can be replaced by one `ldr` instruction `ldr x8, [sp], #0x10`, because we just need to restore `x8`, and `x0` and `x1` will be loaded from stack after this and before the final `blr x8`.
> // don't touch regs besides x0, x1
> __bolt_instr_ind_call_handler: (exit snippet)
> [x] ldr x1, sp, 16
> msr nzcv, x1
> [x] ldp x0, x1, [sp], #16
> ret <---- return to original function with indirect call
>
> __bolt_instr_ind_call_handler_func: (entry snippet)
> [x] stp x0, x1, [sp, #-16]!
> mrs x1, nzcv
> [x] str x1, [sp, #-16]!
> adrp x0, __bolt_instr_ind_call_handler
> add x0, x0, #:lo12:__bolt_instr_ind_call_handler
> ldr x0, [x0]
> cmp x0, #0x0
> b.eq __bolt_instr_ind_call_handler
> str x30, [sp, #-16]!
> blr x0 <--- runtime lib store/load all regs
> ldr x30, [sp], #16
> b __bolt_instr_ind_call_handler
> ```
The `adrp+add+ldr` sequence can be changed to `adrp+ldr` sequence. If needed, we can make `ind_call_handler` at least 8 byte aligned:
adrp x0, __bolt_instr_ind_call_handler
ldr x0, [x0, #:lo12:__bolt_instr_ind_call_handler]
The `str x1, [sp, #-16]!` instruction immediately after `mrs x1, nzcv` can be removed, so is the `ldr x1, sp, 16` instruction at the entry of exit snippet, since indirect call handler defined in runtime library will save and restore all the registers, so we don't need to save and restore x1 here.
Similar reason, we can also remove the `stp` at the beginning of entry snippet, and `ldp` before the `ret` in exit snippet.
https://github.com/llvm/llvm-project/pull/141918
More information about the llvm-commits
mailing list