[llvm] [llvm-exegesis] [AArch64] Add support for Load Instructions in subprocess execution mode (PR #144895)

Lakshay Kumar via llvm-commits llvm-commits at lists.llvm.org
Tue Oct 7 05:37:41 PDT 2025


lakshayk-nv wrote:


Current state of PR will enable the following :
- Manual memory snippets are now supported for AArch64 in exegesis.
- Load instructions no longer segfault - Registers are now properly initialized with valid memory addresses in subprocess execution mode.
- Basic subprocess execution - Load instructions (e.g., LD1B) execute without errors in subprocess mode
- Auxiliary memory mmap and Manual snippet mmap working with anonymous memory mapping.

Limitations:
- Unreliable measurements due to ioctl syscall failing, file descriptor for perf event is not populated at auxiliary memory location where it is expected. (Inclined to take up this in future)
- (FYI) All Memory mapping is anonymous, not a functional problem. (Inclined to take up this in future)
- Temporary fix of loading first register with a valid memory address needed to generalized to load register based on instruction structure. (Out of scope for this PR, Resolve in future PR)


And, for reference output of LD1B in subprocess mode for benchmarking latency:- 
```yaml
$ build/bin/llvm-exegesis -mode=latency --execution-mode=subprocess --opcode-name=LD1B --debug-only="preview-gen-assembly"  
Warning: generateMmapAuxMem using anonymous mapping
Warning: setStackRegisterToAuxMem called but not required for AArch64
Warning: configurePerfCounter ioctl syscall failing
Warning: configurePerfCounter ioctl syscall failing
Warning: generateMmapAuxMem using anonymous mapping
Warning: setStackRegisterToAuxMem called but not required for AArch64
Warning: configurePerfCounter ioctl syscall failing
Warning: configurePerfCounter ioctl syscall failing
Generated assembly snippet:
'''
0:      fc1f0fed        str     d13, [sp, #-16]!
4:      f90007f7        str     x23, [sp, #8]
8:      b26f77e0        mov     x0, #140737488224256
c:      d2820001        mov     x1, #4096
10:     d2800062        mov     x2, #3
14:     d2800423        mov     x3, #33
18:     f2a00203        movk    x3, #16, lsl #16
1c:     92800004        mov     x4, #-1
20:     d2800005        mov     x5, #0
24:     d2801bc8        mov     x8, #222
28:     d4000001        svc     #0
2c:     f81f0fe0        str     x0, [sp, #-16]!
30:     2518e3e6        ptrue   p6.b
34:     f84107ef        ldr     x15, [sp], #16
38:     d2800017        mov     x23, #0
3c:     2518e3e0        ptrue   p0.b
40:     25f8c00d        mov     z13.d, #0
44:     f81f0fe8        str     x8, [sp, #-16]!
48:     f81f0fe0        str     x0, [sp, #-16]!
4c:     f81f0fe1        str     x1, [sp, #-16]!
50:     f81f0fe2        str     x2, [sp, #-16]!
54:     b26f77f0        mov     x16, #140737488224256
58:     b9400200        ldr     w0, [x16]
5c:     d2848061        mov     x1, #9219
60:     d2800022        mov     x2, #1
64:     d28003a8        mov     x8, #29
68:     d4000001        svc     #0
6c:     f84107e2        ldr     x2, [sp], #16
70:     f84107e1        ldr     x1, [sp], #16
74:     f84107e0        ldr     x0, [sp], #16
78:     f84107e8        ldr     x8, [sp], #16
7c:     a41759f0        ld1b    { z16.b }, p6/z, [x15, x23]
80:     244d0207        cmphs   p7.h, p0/z, z16.h, z13.h
84:     a41759f0        ld1b    { z16.b }, p6/z, [x15, x23]
88:     244d0207        cmphs   p7.h, p0/z, z16.h, z13.h
...     (9994 more instructions)
9cb4:   a41759f0        ld1b    { z16.b }, p6/z, [x15, x23]
9cb8:   244d0207        cmphs   p7.h, p0/z, z16.h, z13.h
9cbc:   b26f77f0        mov     x16, #140737488224256
9cc0:   b9400200        ldr     w0, [x16]
9cc4:   d2848021        mov     x1, #9217
9cc8:   d2800022        mov     x2, #1
9ccc:   d28003a8        mov     x8, #29
9cd0:   d4000001        svc     #0
9cd4:   d2800000        mov     x0, #0
9cd8:   d2800ba8        mov     x8, #93
9cdc:   d4000001        svc     #0
9ce0:   f94007f7        ldr     x23, [sp, #8]
9ce4:   fc4107ed        ldr     d13, [sp], #16
9ce8:   d65f03c0        ret
'''
---
mode:            latency
key:
  instructions:
    - 'LD1B Z16 P6 X15 X23'
    - 'CMPHS_PPzZZ_H P7 P0 Z16 Z13'
  config:          ''
  register_initial_values:
    - 'P6=0x0'
    - 'X15=0x0'
    - 'X23=0x0'
    - 'P0=0x0'
    - 'Z13=0x0'
cpu_name:        neoverse-v2
llvm_triple:     aarch64-unknown-linux-gnu
min_instructions: 10000
measurements:
  - { key: latency, value: 7.2836, per_snippet_value: 14.5672, validation_counters: {} }
error:           ''
info:            Repeating two instructions
assembled_snippet: ED0F1FFCF70700F9E0776FB2010082D2620080D2230480D20302A0F204008092050080D2C81B80D2010000D4E00F1FF8E6E31825EF0741F8170080D2E0E318250DC0F825E80F1FF8E00F1FF8E10F1FF8E20F1FF8F0776FB2000240B9618084D2220080D2A80380D2010000D4E20741F8E10741F8E00741F8E80741F8F05917A407024D24F05917A407024D24F05917A407024D24F05917A407024D24F0776FB2000240B9218084D2220080D2A80380D2010000D4000080D2A80B80D2010000D4F70740F9ED0741FCC0035FD6
...
```

https://github.com/llvm/llvm-project/pull/144895


More information about the llvm-commits mailing list