[llvm] [BPF] expand mem intrinsics (memcpy, memmove, memset) (PR #97648)
via llvm-commits
llvm-commits at lists.llvm.org
Fri Jul 5 09:22:31 PDT 2024
yonghong-song wrote:
If the user has a call to memcpy which cannot be simply unrolled, the current behavior is to issue an error. For example,
```
$ cat test1.c
#include <stdint.h>
#include <string.h>
typedef struct {
unsigned char x[8];
} buf_t;
void f(buf_t *buf, uint64_t y, uint64_t z) {
if (z > 8) z = 8;
unsigned char *y_bytes = (unsigned char *)&y;
memcpy(buf->x, y_bytes, z);
}
```
With current compiler (llvm18), we will have
```
/* I add gnu/stubs-32.h in the current directory to ensure compilation pass */
$ clang -O2 -target bpf -I . test1.c -c
test1.c:6:6: error: A call to built-in function 'memcpy' is not supported.
6 | void f(buf_t *buf, uint64_t y, uint64_t z) {
| ^
1 error generated.
```
But with this patch, I see
```
0000000000000000 <f>:
0: 7b 2a f8 ff 00 00 00 00 *(u64 *)(r10 - 0x8) = r2
1: b7 04 00 00 08 00 00 00 r4 = 0x8
2: bf 32 00 00 00 00 00 00 r2 = r3
3: 2d 34 01 00 00 00 00 00 if r4 > r3 goto +0x1 <f+0x28>
4: b7 02 00 00 08 00 00 00 r2 = 0x8
5: 15 02 0b 00 00 00 00 00 if r2 == 0x0 goto +0xb <f+0x88>
6: b7 02 00 00 00 00 00 00 r2 = 0x0
7: bf 15 00 00 00 00 00 00 r5 = r1
8: 0f 25 00 00 00 00 00 00 r5 += r2
9: bf a0 00 00 00 00 00 00 r0 = r10
10: 07 00 00 00 f8 ff ff ff r0 += -0x8
11: 0f 20 00 00 00 00 00 00 r0 += r2
12: 71 00 00 00 00 00 00 00 r0 = *(u8 *)(r0 + 0x0)
13: 73 05 00 00 00 00 00 00 *(u8 *)(r5 + 0x0) = r0
14: 07 02 00 00 01 00 00 00 r2 += 0x1
15: 3d 32 01 00 00 00 00 00 if r2 >= r3 goto +0x1 <f+0x88>
16: 2d 24 f6 ff 00 00 00 00 if r4 > r2 goto -0xa <f+0x38>
17: 95 00 00 00 00 00 00 00 exit
```
basically memcpy is 'inlined' by the compiler.
This is probably not what we want for test1.c since memcpy call is from user. In such cases, we would like user to explicitly write the loop to have maximum performance (e.g., load/store with u64 and remaining with u8 etc).
Maybe somehow we should prevent memcpy generation if it cannot be fully unrolled later (meaning no loops)? We have some backend hooks at IR level, maybe we can leverage them?
https://github.com/llvm/llvm-project/pull/97648
More information about the llvm-commits
mailing list