[llvm] [BPF] expand mem intrinsics (memcpy, memmove, memset) (PR #97648)

Thu Jul 4 19:06:12 PDT 2024

eddyz87 wrote:

Hi @inclyc ,

I tried this patch with BPF Linux kernel selftests and this triggered a verification failure with `async_stack_depth.bpf.o`.
The reason for this failure is a change in a way `char buf[256] = {}` is translated.
Before this change it looked as follows:

```
0000000000000000 <timer_cb>:
; {
       0:	b7 01 00 00 00 00 00 00	r1 = 0x0
    ...
; 	volatile char buf[256] = {};
    ...
       9:	b4 02 00 00 00 00 00 00	w2 = 0x0
      10:	63 2a f8 ff 00 00 00 00	*(u32 *)(r10 - 0x8) = r2
      11:	73 2a fc ff 00 00 00 00	*(u8 *)(r10 - 0x4) = r2
      12:	73 2a b7 ff 00 00 00 00	*(u8 *)(r10 - 0x49) = r2
      13:	6b 2a b0 ff 00 00 00 00	*(u16 *)(r10 - 0x50) = r2
      14:	7b 1a a8 ff 00 00 00 00	*(u64 *)(r10 - 0x58) = r1
      15:	7b 1a a0 ff 00 00 00 00	*(u64 *)(r10 - 0x60) = r1
      16:	7b 1a 98 ff 00 00 00 00	*(u64 *)(r10 - 0x68) = r1
      17:	7b 1a 90 ff 00 00 00 00	*(u64 *)(r10 - 0x70) = r1
      18:	7b 1a 88 ff 00 00 00 00	*(u64 *)(r10 - 0x78) = r1
    ...
```

After this change it looks as follows:

```
; 	volatile char buf[256] = {};
       9:	73 1a ba ff 00 00 00 00	*(u8 *)(r10 - 0x46) = r1
      10:	b7 02 00 00 00 00 00 00	r2 = 0x0
      11:	bf a3 00 00 00 00 00 00	r3 = r10
      12:	07 03 00 00 00 ff ff ff	r3 += -0x100
      13:	0f 23 00 00 00 00 00 00	r3 += r2
      14:	73 13 00 00 00 00 00 00	*(u8 *)(r3 + 0x0) = r1
      15:	07 02 00 00 01 00 00 00	r2 += 0x1
      16:	a5 02 fa ff ba 00 00 00	if r2 < 0xba goto -0x6 <timer_cb+0x58>
```

Basically, fully unrolled version was replaced by loop.
There is some code in the `BPFISelLowering.cpp` that specifies limits for full unroll:

```cpp
BPFTargetLowering::BPFTargetLowering(const TargetMachine &TM,
                                     const BPFSubtarget &STI)
    : TargetLowering(TM) {
  ...
    MaxStoresPerMemset = MaxStoresPerMemsetOptSize = 0;
    MaxStoresPerMemcpy = MaxStoresPerMemcpyOptSize = 0;
    MaxStoresPerMemmove = MaxStoresPerMemmoveOptSize = 0;
  ...
}
```

Because of the way kernel BPF verifier works, the unrolled version is preferable to the loop (verifier traces execution, so processing a loop would take more instruction budget). Is it possible to make `llvm::expandMemMoveAsLoop()` respect the limits set by `MaxStoresPerMemset`?

@yonghong-song , fyi. 

https://github.com/llvm/llvm-project/pull/97648