[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)

Peilin Ye via cfe-commits cfe-commits at lists.llvm.org
Wed Oct 23 18:23:51 PDT 2024


peilin-ye wrote:

@yonghong-song, back to your example:

> ```
> $ cat t4.c
> short foo(short *ptr) {
>   return __atomic_load_n(ptr, __ATOMIC_ACQUIRE);
> }
> ```
> ```
> 0000000000000000 <foo>:
>        0:       e9 10 00 00 00 00 00 00 w0 = load_acquire((u16 *)(r1 + 0x0))
>        1:       95 00 00 00 00 00 00 00 exit
> ```

Assuming that the above `load_acquire` zero-extends.  If the calling convention here is "`foo()`'s return value is represented by the entire `w0` half-register", then yes, we're doing it wrong.  However, if I do this:

```c
// clang --target=bpf -mcpu=v4 -O2 -g -c -o test.bpf.o test.bpf.c
__attribute__((noinline)) short bar(short *ptr) {
    return __atomic_load_n(ptr, __ATOMIC_ACQUIRE);
}

long foo(short *ptr) {
    return bar(ptr);
}
```
```
0000000000000000 <bar>:
;     return __atomic_load_n(ptr, __ATOMIC_ACQUIRE);
       0:	cb 10 00 00 10 00 00 00	w0 = load_acquire((u16 *)(r1 + 0x0))
       1:	95 00 00 00 00 00 00 00	exit

0000000000000010 <foo>:
;     return bar(ptr);
       2:	85 10 00 00 ff ff ff ff	call -0x1
       3:	bf 00 10 00 00 00 00 00	r0 = (s16)r0
       4:	95 00 00 00 00 00 00 00	exit
```
Things still work, because `foo()` treats `bar()`'s return value as `s16`.
- - -
It's the same story for ARM64.  If I compile the above program with `--target=aarch64`:
```
0000000000000000 <bar>:
       0: 48dffc00     	ldarh	w0, [x0]
       4: d65f03c0     	ret

0000000000000008 <foo>:
       8: a9bf7bfd     	stp	x29, x30, [sp, #-0x10]!
       c: 910003fd     	mov	x29, sp
      10: 94000000     	bl	0x10 <foo+0x8>
      14: 93403c00     	sxth	x0, w0
      18: a8c17bfd     	ldp	x29, x30, [sp], #0x10
      1c: d65f03c0     	ret
```
`ldarh` zero-extends the halfword:

> Load-Acquire Register Halfword derives an address from a base register value, loads a halfword from memory,
zero-extends it, and writes it to a register.

Then the caller, `foo()`, does a `sxth` to sign-extend the halfword:

> Sign Extend Halfword extracts a 16-bit value, sign-extends it to the size of the register, and writes the result to the destination register.
- - -
Is it fair to say that your example demonstrated expected behavior?

https://github.com/llvm/llvm-project/pull/108636


More information about the cfe-commits mailing list