[llvm] [BPF] expand mem intrinsics (memcpy, memmove, memset) (PR #97648)
via llvm-commits
llvm-commits at lists.llvm.org
Thu Jul 4 19:06:12 PDT 2024
eddyz87 wrote:
Hi @inclyc ,
I tried this patch with BPF Linux kernel selftests and this triggered a verification failure with `async_stack_depth.bpf.o`.
The reason for this failure is a change in a way `char buf[256] = {}` is translated.
Before this change it looked as follows:
```
0000000000000000 <timer_cb>:
; {
0: b7 01 00 00 00 00 00 00 r1 = 0x0
...
; volatile char buf[256] = {};
...
9: b4 02 00 00 00 00 00 00 w2 = 0x0
10: 63 2a f8 ff 00 00 00 00 *(u32 *)(r10 - 0x8) = r2
11: 73 2a fc ff 00 00 00 00 *(u8 *)(r10 - 0x4) = r2
12: 73 2a b7 ff 00 00 00 00 *(u8 *)(r10 - 0x49) = r2
13: 6b 2a b0 ff 00 00 00 00 *(u16 *)(r10 - 0x50) = r2
14: 7b 1a a8 ff 00 00 00 00 *(u64 *)(r10 - 0x58) = r1
15: 7b 1a a0 ff 00 00 00 00 *(u64 *)(r10 - 0x60) = r1
16: 7b 1a 98 ff 00 00 00 00 *(u64 *)(r10 - 0x68) = r1
17: 7b 1a 90 ff 00 00 00 00 *(u64 *)(r10 - 0x70) = r1
18: 7b 1a 88 ff 00 00 00 00 *(u64 *)(r10 - 0x78) = r1
...
```
After this change it looks as follows:
```
; volatile char buf[256] = {};
9: 73 1a ba ff 00 00 00 00 *(u8 *)(r10 - 0x46) = r1
10: b7 02 00 00 00 00 00 00 r2 = 0x0
11: bf a3 00 00 00 00 00 00 r3 = r10
12: 07 03 00 00 00 ff ff ff r3 += -0x100
13: 0f 23 00 00 00 00 00 00 r3 += r2
14: 73 13 00 00 00 00 00 00 *(u8 *)(r3 + 0x0) = r1
15: 07 02 00 00 01 00 00 00 r2 += 0x1
16: a5 02 fa ff ba 00 00 00 if r2 < 0xba goto -0x6 <timer_cb+0x58>
```
Basically, fully unrolled version was replaced by loop.
There is some code in the `BPFISelLowering.cpp` that specifies limits for full unroll:
```cpp
BPFTargetLowering::BPFTargetLowering(const TargetMachine &TM,
const BPFSubtarget &STI)
: TargetLowering(TM) {
...
MaxStoresPerMemset = MaxStoresPerMemsetOptSize = 0;
MaxStoresPerMemcpy = MaxStoresPerMemcpyOptSize = 0;
MaxStoresPerMemmove = MaxStoresPerMemmoveOptSize = 0;
...
}
```
Because of the way kernel BPF verifier works, the unrolled version is preferable to the loop (verifier traces execution, so processing a loop would take more instruction budget). Is it possible to make `llvm::expandMemMoveAsLoop()` respect the limits set by `MaxStoresPerMemset`?
@yonghong-song , fyi.
https://github.com/llvm/llvm-project/pull/97648
More information about the llvm-commits
mailing list