[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)

Mon Oct 7 22:57:58 PDT 2024

peilin-ye wrote:

Thanks for all your suggestions!  Main changes in patchset v5:

1. use `-mcpu=v4` instead of `-mcpu=v5`
2. make both load-acquire and store-release `BPF_STX | BPF_ATOMIC` insns, instead of introducing new mode modifiers:

`BPF_STX | BPF_ATOMIC` insns use a subset of `BPFArithOp<>` in bit `4-7` of the `imm` field.  Currently these are in use:

```
def BPF_ADD  : BPFArithOp<0x0>;
def BPF_OR   : BPFArithOp<0x4>;
def BPF_AND  : BPFArithOp<0x5>;
def BPF_XOR  : BPFArithOp<0xa>;
def BPF_XCHG    : BPFArithOp<0xe>;
def BPF_CMPXCHG : BPFArithOp<0xf>;
```

On top of this, I tentatively chose `0x1` for `BPF_LOAD_ACQ`, and `0xb` for `BPF_STORE_REL`, because:
 - `0x1` is `BPF_SUB` in `BPFArithOp<>`, but atomic SUB is implemented using a NEG + ADD:
```
    // atomic_load_sub can be represented as a neg followed
    // by an atomic_load_add.
```
 - `0xb` is `BPF_MOV` in `BPFArithOp<>`, which is "moving value between registers" and isn't applicable for atomic (memory) operations

Please let me know if you have better suggestions for encoding (or if I'm overthinking this and we can simply use `0x1` and `0x2`).
- - -
@eddyz87, do my changes to `BPFMISimplifyPatchable.cpp` still look good to you?  Now that load-acquires are `STX` insns, I wanted to make sure that `BPFMISimplifyPatchable::checkADDrr()` can still handle them correctly for CO-RE.

I'll add tests to `llvm/test/CodeGen/BPF/CORE/` later, like what you did in commit 08d92dedd26c ("[BPF] Fix in/out argument constraints for CORE_MEM instructions").
- - -
About whether we should sign-extend for 8- and 16-bit load-acquires (brought up by Yonghong):

All ARM64 insns that match `acquiring_load<atomic_load[_az]_{8,16}>` seem to zero-extend the value before writing it to register, like, `LDAPRH`:

> Load-Acquire RCpc Register Halfword derives an address from a base register value, loads a halfword from the derived address in memory, zero-extends it and writes it to a register.

So right now I'm keeping our `LDBACQ32` and `LDHACQ32` to zero-extend.  I'll take a look at other archs later.

https://github.com/llvm/llvm-project/pull/108636