[llvm] [RFC][BPF] Do atomic_fetch_*() pattern matching with memory ordering (PR #107343)

Thu Sep 5 16:46:06 PDT 2024

eddyz87 wrote:

So, basically we want to generate `lock *(u64 *)(rX + ...) += rY` for `__c11_atomic_fetch_and(..., memory_order_relaxed)` and `rX = atomic_fetch_add(...)` for everything else.
`memory_order_relaxed` corresponds to [monotonic](https://llvm.org/docs/Atomics.html#monotonic) LLVM IR ordering.
And basing on the referred mailing list discussion, this is done to allow ARM jit to generate [`LDADD`](https://developer.arm.com/documentation/ddi0602/2024-06/Base-Instructions/LDADD--LDADDA--LDADDAL--LDADDL--Atomic-add-on-word-or-doubleword-in-memory-) instruction (w/o `L` or `A` or `AL` suffixes) for this C interface (which it currently does for `lock ...` instructions). 

I was unable to figure out from the ARM documentation whether `LDADD` is [monotonic](https://llvm.org/docs/Atomics.html#monotonic) or [unordered](https://llvm.org/docs/Atomics.html#unordered). However, the test below shows that at-least we are on the same page with LLVM ARM backend:

```c
$ cat test2.c
void f1(_Atomic long *i) { __c11_atomic_fetch_add(i, 10, __ATOMIC_RELAXED); }
void f2(_Atomic long *i) { __c11_atomic_fetch_add(i, 10, __ATOMIC_CONSUME); }
void f3(_Atomic long *i) { __c11_atomic_fetch_add(i, 10, __ATOMIC_ACQUIRE); }
void f4(_Atomic long *i) { __c11_atomic_fetch_add(i, 10, __ATOMIC_RELEASE); }
void f5(_Atomic long *i) { __c11_atomic_fetch_add(i, 10, __ATOMIC_ACQ_REL); }
void f6(_Atomic long *i) { __c11_atomic_fetch_add(i, 10, __ATOMIC_SEQ_CST); }
```
```bash
$ clang --target=aarch64 -march=armv8.1-a -O2 test2.c -c -o - | llvm-objdump -Sdr -
```
```asm
<stdin>:	file format elf64-littleaarch64

Disassembly of section .text:

0000000000000000 <f1>:
       0: 52800148     	mov	w8, #0xa                // =10
       4: f8280008     	ldadd	x8, x8, [x0]
       8: d65f03c0     	ret

000000000000000c <f2>:
       c: 52800148     	mov	w8, #0xa                // =10
      10: f8a80008     	ldadda	x8, x8, [x0]
      14: d65f03c0     	ret
0000000000000018 <f3>:
      18: 52800148     	mov	w8, #0xa                // =10
      1c: f8a80008     	ldadda	x8, x8, [x0]
      20: d65f03c0     	ret
0000000000000024 <f4>:
      24: 52800148     	mov	w8, #0xa                // =10
      28: f8680008     	ldaddl	x8, x8, [x0]
      2c: d65f03c0     	ret
0000000000000030 <f5>:
      30: 52800148     	mov	w8, #0xa                // =10
      34: f8e80008     	ldaddal	x8, x8, [x0]
      38: d65f03c0     	ret
000000000000003c <f6>:
      3c: 52800148     	mov	w8, #0xa                // =10
      40: f8e80008     	ldaddal	x8, x8, [x0]
      44: d65f03c0     	ret
```

So, I think we are good.

Question: should BPF backend report and error if `__ATOMIC_{CONSUME,ACQUIRE,RELEASE,ACQ_REL}` is used?
LLVM documentation [allows this](https://llvm.org/docs/Atomics.html#unordered).

https://github.com/llvm/llvm-project/pull/107343