[llvm] [BOLT][AArch64] Fixed indirect call instrumentation snippet (PR #141918)

YongKang Zhu via llvm-commits llvm-commits at lists.llvm.org
Wed Nov 5 10:51:43 PST 2025


yozhu wrote:

@yavtuk Thanks for the patch!

We have some similar internal changes on rewriting indirect call instrumentation code sequence. I compared the changes and would like to check with you whether the differences are correct:

> Instrumented indirect call:
>         stp     x0, x1, [sp, #-16]!
>         mov     x0, x8
>         movk    x1, #0x0, lsl #48
>         movk    x1, #0x0, lsl #32
>         movk    x1, #0x0, lsl #16
>         movk    x1, #0x0
>         stp     x0, x0, [sp, #-16]!
>         adrp    x8, __bolt_instr_ind_call_handler_func
>         add     x8, x8, #:lo12:__bolt_instr_ind_call_handler_func
>         str     x30, [sp, #-16]!
>         blr     x8       <--- call trampoline instr lib
>         ldr     x30, [sp], #16
>   [x] ldp     x0, x1, [sp], #16
>   [x] mov     x8, x0   <---- restore original target
>        ldp     x0, x1, [sp], #16
>         blr     x8       <--- original indirect call instruction

There is a typo in the `stp` instruction immediately after the four `movk` instructions - it should be `stp x0, x1, [sp, #-16]!`, right?

For the call to trampoline sequence, we can also just use `x0` to hold the trampoline address. It doesn't need to be `x8`, the original register used for indirect call.  But this doesn't impact functionality or code sequence length.

The two instructions marked with `[x]` can be replaced by one `ldr` instruction `ldr x8, [sp], #0x10`,  because we just need to restore `x8`, and `x0` and `x1` will be loaded from stack after this and before the final `blr x8`.

> // don't touch regs besides x0, x1
> __bolt_instr_ind_call_handler:  (exit snippet)
>   [x] ldr     x1, sp, 16
>         msr     nzcv, x1
>   [x]  ldp     x0, x1, [sp], #16
>         ret     <---- return to original function with indirect call
> 
> __bolt_instr_ind_call_handler_func: (entry snippet)
>   [x] stp     x0, x1, [sp, #-16]!
>         mrs     x1, nzcv
>   [x]  str     x1, [sp, #-16]!
>         adrp    x0, __bolt_instr_ind_call_handler
>         add     x0, x0, #:lo12:__bolt_instr_ind_call_handler
>         ldr     x0, [x0]
>         cmp     x0, #0x0
>         b.eq    __bolt_instr_ind_call_handler
>         str     x30, [sp, #-16]!
>         blr     x0     <--- runtime lib store/load all regs
>         ldr     x30, [sp], #16
>         b       __bolt_instr_ind_call_handler
> ```

The `adrp+add+ldr` sequence can be changed to `adrp+ldr` sequence. If needed, we can make `ind_call_handler` at least 8 byte aligned:

          adrp    x0, __bolt_instr_ind_call_handler
          ldr        x0, [x0, #:lo12:__bolt_instr_ind_call_handler]

The `str x1, [sp, #-16]!` instruction immediately after `mrs x1, nzcv` can be removed, so is the `ldr x1, sp, 16` instruction at the entry of exit snippet, since indirect call handler defined in runtime library will save and restore all the registers, so we don't need to save and restore x1 here.

Similar reason, we can also remove the `stp` at the beginning of entry snippet, and `ldp` before the `ret` in exit snippet.

https://github.com/llvm/llvm-project/pull/141918


More information about the llvm-commits mailing list