[llvm] BPF: Ensure __sync_fetch_and_add() always generate atomic_fetch_add insn (PR #101428)

Mon Aug 26 18:57:05 PDT 2024

eddyz87 wrote:

@brycekahle,

> It turns out what breaks inline assembly with the default target is trying to use the bpf registers by name. Since the example you gave does not do that, it does actually compile.

For asm body or for asm parameters?

I was able to get error from `clang` specifying register names in parameters. Clang does some limited target-specific checks for inline assembly parameters, e.g. see [virtual TargetInfo::isValidGCCRegisterName](https://github.com/llvm/llvm-project/blob/085587e1a9cdf1625efca61153a7dfe30946a6ce/clang/include/clang/Basic/TargetInfo.h#L1064) which is used for semantic checks ([here](https://github.com/llvm/llvm-project/blob/1200d35e0b1bd33cf6b06c185384f78226b619ae/clang/lib/Basic/TargetInfo.cpp#L631) and [here](https://github.com/llvm/llvm-project/blob/1200d35e0b1bd33cf6b06c185384f78226b619ae/clang/lib/Sema/SemaStmtAsm.cpp#L469)). And this is why I said the this `clang --target=native + llc -march=bpf` arrangement is strange and is ultimately a hack.

> I still think generating v3 instructions when v1 is explicitly asked for, is a giant footgun that is going to trip up a lot of folks. Could you do this new behavior only when v3 (or later) is used?

Agree that this is a footgun. I think we should report an error when `__sync_fetch_and_add` is used for cpu v1.
And probably provide an intrinsic to generate `xadd`. However, the following code works:

```
$ cat test3.c 
__attribute__((always_inline))
void xadd(unsigned long *p, unsigned long v)
{
  asm volatile("lock *(u64 *)(%[p] + 0) += %[v];" :: [p]"r"(p), [v]"r"(v): "memory");
}

void foo(unsigned long *p, unsigned long v)
{
  xadd(p, v);
}

$ clang -O2 -S -emit-llvm -o - test3.c | llc -march=bpf -mcpu=v1 -o - -filetype=obj | llvm-objdump --no-show-raw-insn -d -

<stdin>:	file format elf64-bpf

Disassembly of section .text:

0000000000000000 <xadd>:
       0:	lock *(u64 *)(r1 + 0x0) += r2
       1:	exit

0000000000000010 <foo>:
       2:	lock *(u64 *)(r1 + 0x0) += r2
       3:	exit
```

So intrinsic is optional.

https://github.com/llvm/llvm-project/pull/101428