[llvm] BPF: Generate locked insn for __sync_fetch_and_add() with cpu v1/v2 (PR #106494)

Thu Sep 12 11:29:26 PDT 2024

yonghong-song wrote:

@th0rex thanks for the reporting. The newer behavior in llvm20 is correct. The old one has some issues and some instructions supposed not support in v1/v2 but somehow we did it and their usage is not intentional. But unfortunately there is no way to do backport. See below for some details.

```
$ cat t.c                                                                                                                         
int f1(int *i) {                                                                                                                                             
   return __sync_fetch_and_add(i, 10);                                                                                                                       
}
$ ~/work/llvm-project/llvm/build.15/install/bin/clang --version
clang version 15.0.7 (https://github.com/llvm/llvm-project.git 8dfdcc7b7bf66834a761bd8de445840ef68e4d1a)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/yhs/work/llvm-project/llvm/build.15/install/bin
$ ~/work/llvm-project/llvm/build.15/install/bin/clang --target=bpf -O2 -mcpu=v1 -c t.c                                 
fatal error: error in backend: Invalid usage of the XADD return value
```
You can see a compiler error will be emitted for cpu v1 if the return value is used. The same will be for v2. For v3, it will be okay since v3 enables 32-bit subregister and we allows new insn atomic_fetch_add. for cpu v1/v2, the atomic_fetch_add is not available.

But for 64bit __sync_fetch_and_add(), it seems going through.
```
$ cat t1.c
long f1(long *i) {
   return __sync_fetch_and_add(i, 10);
}
$ ~/work/llvm-project/llvm/build.15/install/bin/clang --target=bpf -O2 -mcpu=v1 -c t1.c
$ ~/work/llvm-project/llvm/build.15/install/bin/llvm-objdump -d t1.o

t1.o:   file format elf64-bpf

Disassembly of section .text:

0000000000000000 <f1>:
       0:       b7 00 00 00 0a 00 00 00 r0 = 10
       1:       db 01 00 00 01 00 00 00 r0 = atomic_fetch_add((u64 *)(r1 + 0), r0)
       2:       95 00 00 00 00 00 00 00 exit
```

So there is a discrepancy between 32bit and 64bit. But intention is for <= llvm19, for v1/v2, no return value should be used (using locked insn) and compiler should issue an error when it does use return value. For v3, it is okay to use return value (using atomic_fetch_add insn).

llvm20 fixed this issue.
```
$ cat t.c
int f1(int *i) {
   return __sync_fetch_and_add(i, 10);
}
[yhs at devbig309.ftw3 ~/tmp4]$ cat t1.c
long f1(long *i) {
  return __sync_fetch_and_add(i, 10);
}
$ ~/work/llvm-project/llvm/build.20/install/bin/clang --target=bpf -O2 -mcpu=v1 -c t.c
t.c:1:5: error: Invalid usage of the XADD return value
    1 | int f1(int *i) {
      |     ^
1 error generated.
$ ~/work/llvm-project/llvm/build.20/install/bin/clang --target=bpf -O2 -mcpu=v1 -c t1.c
t1.c:1:6: error: Invalid usage of the XADD return value
    1 | long f1(long *i) {
      |      ^
1 error generated.
$ ~/work/llvm-project/llvm/build.20/install/bin/clang --target=bpf -O2 -mcpu=v3 -c t.c
$ ~/work/llvm-project/llvm/build.20/install/bin/clang --target=bpf -O2 -mcpu=v3 -c t1.c
$
```

So for 64bit insn, using inline asm with atomc_fetch_add() is the right choice if you intends to use cpu v1/v2.

https://github.com/llvm/llvm-project/pull/106494