[PATCH] D140804: [BPF] support for BPF_ST instruction in codegen

Fri Aug 4 09:08:05 PDT 2023

eddyz87 created this revision.
Herald added a subscriber: hiraditya.
Herald added a project: All.
eddyz87 updated this revision to Diff 545494.
eddyz87 added a comment.
eddyz87 updated this revision to Diff 547064.
eddyz87 retitled this revision from "BPF: support for BPF_ST instruction in codegen" to "[BPF] support for BPF_ST instruction in codegen".
eddyz87 edited the summary of this revision.
eddyz87 updated this revision to Diff 547186.
eddyz87 published this revision for review.
eddyz87 added a reviewer: yonghong-song.
Herald added a project: LLVM.
Herald added a subscriber: llvm-commits.

Generate BPF_ST when cpuv4 is specified.

eddyz87 added a comment.

BPFMISimplifyPatchable::checkADDrr for BPF_ST instructions.

eddyz87 added a comment.

Rebase

eddyz87 added a comment.

Hi Yonghong,

Could you please take a look at this revision? It enables generation of BPF_ST instruction when CPUv4 is selected.
When enabled the following kernel BPF selftests fail:

- log_fixup/missing_map
- spin_lock/lock_id_mapval_preserve
- spin_lock/lock_id_innermapval_preserve

All failures are caused by the difference in the expected log messages (when BPF_ST is enabled less instructions are generated => instruction numbers in the log a slightly off). I will submit kernel patch to relax log messages after this revision is accepted (but before landing it).

Impact basing on the kernel selftests:

- in total 653 *.bpf.o files are generated
- 377 are identical
- 265 have less instructions
- 2 have more instructions

Most of the changes are obvious: sequences like `r0 = 0; *(u64 *)(r10 - 8) = r0;` are replaced by a single instruction. For tests where the number of instructions increased I took a closer look:

- ip_check_defrag.bpf.o, # of insns increased from 58 to 63: a few more instructions are generated because of a difference in register allocation
- pyperf_subprogs.bpf.o, # of insns increased from 4421 to 4434: I can't pinpoint a stage when additional instructions are generated, it seems to accumulate due to slight difference in register allocation and spilling decisions.

Generate store immediate instruction when CPUv4 is enabled.
For example:

  $ cat test.c
  struct foo {
    unsigned char  b;
    unsigned short h;
    unsigned int   w;
    unsigned long  d;
  };
  void bar(volatile struct foo *p) {
    p->b = 1;
    p->h = 2;
    p->w = 3;
    p->d = 4;
  }

  $ clang -O2 --target=bpf -mcpu=v4 test.c -c -o - | llvm-objdump -d -
  ...
  0000000000000000 <bar>:
         0:	72 01 00 00 01 00 00 00	*(u8 *)(r1 + 0x0) = 0x1
         1:	6a 01 02 00 02 00 00 00	*(u16 *)(r1 + 0x2) = 0x2
         2:	62 01 04 00 03 00 00 00	*(u32 *)(r1 + 0x4) = 0x3
         3:	7a 01 08 00 04 00 00 00	*(u64 *)(r1 + 0x8) = 0x4
         4:	95 00 00 00 00 00 00 00	exit

Take special care to:

- apply `BPFMISimplifyPatchable::checkADDrr` rewrite for BPF_ST
- validate immediate value when BPF_ST write is 64-bit: BPF interprets `(BPF_ST | BPF_MEM | BPF_DW)` writes as writes with sign extension. Thus it is fine to generate such write when immediate is -1, but it is incorrect to generate such write when immediate is +0xffff_ffff.

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D140804

Files:
  llvm/lib/Target/BPF/BPFInstrInfo.td
  llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp
  llvm/lib/Target/BPF/BPFSubtarget.cpp
  llvm/lib/Target/BPF/BPFSubtarget.h
  llvm/test/CodeGen/BPF/CORE/field-reloc-st-imm.ll
  llvm/test/CodeGen/BPF/store_imm.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D140804.547186.patch
Type: text/x-patch
Size: 21102 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20230804/d13620dd/attachment-0001.bin>