[PATCH] D133361: [BPF] Attribute btf_decl_tag("ctx") for structs

Eduard Zingerman via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Sun Jul 2 18:20:52 PDT 2023


eddyz87 created this revision.
Herald added subscribers: hiraditya, mgorny.
Herald added a reviewer: aaron.ballman.
Herald added a project: All.
eddyz87 updated this revision to Diff 459560.
eddyz87 added a comment.
eddyz87 updated this revision to Diff 460356.
eddyz87 updated this revision to Diff 461103.
eddyz87 updated this revision to Diff 461317.
eddyz87 updated this revision to Diff 462038.
eddyz87 updated this revision to Diff 462709.
eddyz87 updated this revision to Diff 534561.
Herald added a subscriber: zzheng.
eddyz87 updated this revision to Diff 535210.
eddyz87 updated this revision to Diff 535380.
eddyz87 updated this revision to Diff 536044.
eddyz87 updated this revision to Diff 536643.
eddyz87 published this revision for review.
eddyz87 edited reviewers, added: yonghong-song; removed: aaron.ballman.
Herald added subscribers: llvm-commits, cfe-commits, jdoerfert.
Herald added a reviewer: aaron.ballman.
Herald added projects: clang, LLVM.

- Added folding for GEP chains with constant indices:
  - inline anonymous unions work as expected
  - type casts to char* with consequitive array access work as expected
- Moved context rewrite to a late callback function processing pipeline stage:
  - necessary to run the rewrite after loop unrolling, as rewrite can't handle non-constant indices for non structural GEP chains


eddyz87 added a comment.

- added GVN pass after context rewrite to handle cases clobbered by marker calls
- added marker simplification and context rewrite passes for `opt` pipeline
- more test cases


eddyz87 added a comment.

Merged simplify and rewrite passes, various adjustments to unclobber CSE and GVN opportunities


eddyz87 added a comment.

Merge getelementptr and get.and.load/store calls for function inlining


eddyz87 added a comment.

Force doesNotAccessMemory for context marker function


eddyz87 added a comment.

Moved readonly/writeonly attrs as properties of call site to handle volatile load and stores


eddyz87 added a comment.

Rebase, replaced .c based back-end changes by .ll based ones.


eddyz87 added a comment.

- renamed llvm.bpf.context.marker to llvm.context.marker.bpf (to avoid issues with intrinsic being looked up in the wrong table and not getting an ID / Attrs for CallInst);
- use new pass manager for tests.


eddyz87 added a comment.

- removed lots of uses of `auto`
- removed usage of std::set


eddyz87 added a comment.

Rebase.


eddyz87 added a comment.

Generate immarg for index arguments of gep.and.load, gep.and.store.


eddyz87 added a comment.

Hi Yonghong,

Could you please take a look at this revision? I tried to describe the mechanics in the description / commit message and comments. Below are details regarding testing.
I tested these changes using a Linux Kernel special branch that declares decl_tag("ctx") for relevant functions (available here <https://github.com/eddyz87/bpf/tree/decl-tag-ctx>).
While comparing object files generated without decl_tag("ctx") and with decl_tag("ctx") I found some differences in 5 tests. (I use "with ctx" and "without ctx" below as a shorthand).

bpf_flow.bpf.o
--------------

Without ctx: 2713 instructions
With    ctx: 2711 instructions

Difference: 2 load instructions appear in different BB and this requires two additional register to register assignments in "without ctx" case.
Difference is caused by `GVNPass::PerformLoadPRE` sub-pass and disappears if `-mllvm -enable-load-pre=false` option is passed. As the name implies this sub-pass operates only on `load` instructions.

connect4_prog.bpf.o
-------------------

Without ctx: 351 instructions
With    ctx: 351 instructions

Difference: one redundant load instruction is put in it's own basic block when compiled without ctx.

  c
  static __inline int set_notsent_lowat(struct bpf_sock_addr *ctx)
  { ... ctx->type ... }
  
  int connect_v4_prog(struct bpf_sock_addr *ctx)
  {
  	...
  	if (set_notsent_lowat(ctx))
  		return 0;
  
  	if (ctx->type ...) // ctx->type load is partially redundat after inlining
  		return 0;
  }

In the without ctx this is done by a part of `JumpThreadingPass`:

  c++
  /// simplifyPartiallyRedundantLoad - If LoadI is an obviously partially
  /// redundant load instruction, eliminate it by replacing it with a PHI node.
  /// This is an important optimization that encourages jump threading, and needs
  /// to be run interlaced with other jump threading tasks.
  bool JumpThreadingPass::simplifyPartiallyRedundantLoad(LoadInst *LoadI)



test_cls_redirect.bpf.o
-----------------------

Without ctx: 1616 instructions
With    ctx: 1615 instructions

Difference: some differences in basic blocks placement. IR basic block structure is identical up until 'Branch Probability Basic Block Placement' machine pass.
Before that in without ctx case there is one additional basic block, which is introduced by GVNPass on `process_icmpv4` function. Differences caused by `GVNPass::PerformLoadPRE` sub-pass which calls
`llvm::SplitCriticalEdge` that introduces the basic block.

test_misc_tcp_hdr_options.bpf.o
-------------------------------

Without ctx: 611 instructions
With    ctx: 612 instructions

Difference: one more load instruction generated in with ctx case. Happens in the following code snippet:

  c
  unsigned int nr_data = 0;
  ...
  static int check_active_hdr_in(struct bpf_sock_ops *skops)
  {
  	...
  	if (... < skops->skb_len)  // (1)
  		nr_data++;
  	...
  	if (... == skops->skb_len) // (2)
  		nr_pure_ack++;
  	...
  }

When compiled without ctx `EarlyCSEPass` reuses `skops->skb_len` computed at (1) for point (2). However, when compiled with ctx this does not happen. This happens because clang is not sure if `skops`, passed as parameter to `getelementptr.and.load` shares memory with `nr_data`. The following modification removes all differences:

  c
  static int check_active_hdr_in(struct bpf_sock_ops * restrict skops)



test_tcpnotify_kern.bpf.o
-------------------------

Without ctx: 57 instructions
With    ctx: 58 instructions

Difference: one more store instruction generated, CFG structure differs slightly.
Difference is caused by `SimplifyCFGPass`, difference disappears if `-mllvm -sink-common-insts=false` option is supplied. The following IR snippet is the cause:

  llvm
  ...
  if.then:                                          ; preds = %entry
    %1 = getelementptr inbounds %struct.bpf_sock_ops, ptr %skops, i64 0, i32 1
    store i32 -1, ptr %1, align 4, !tbaa !240
    br label %cleanup30
  ...
  ...
  sw.epilog:                                        ; preds = %if.end, ...
    %rv.0 = phi i32 [ -1, %sw.default ], ...
    %7 = getelementptr inbounds %struct.bpf_sock_ops, ptr %skops, i64 0, i32 1
    store i32 %rv.0, ptr %7, align 4, !tbaa !240
    br label %cleanup30, !dbg !278
  
  cleanup30:                                        ; preds = %sw.epilog, %if.then
    %retval.0 = phi i32 [ 0, %if.then ], [ 1, %sw.epilog ], !dbg !226
    ret i32 %retval.0

Without ctx `SimplifyCFGPass` sinks `getelementptr` and `store` instructions to `cleanup30` BB:

  llvm
  cleanup30:                                        ; preds = %sw.bb19, ...
    %rv.0.sink = phi i32 [ -1, %entry ], ...
    %retval.0 = phi i32 [ 0, %entry ], ...
    %6 = getelementptr inbounds %struct.bpf_sock_ops, ptr %skops, i64 0, i32 1, !dbg !226
    store i32 %rv.0.sink, ptr %6, align 4, !dbg !226, !tbaa !270
    ret i32 %retval.0, !dbg !271

For ctx case snippet looks as follows:

  llvm
  if.then:                                          ; preds = %entry
    tail call void ... @llvm.bpf.getelementptr.and.store.i32
      (i32 -1, ptr nonnull writeonly elementtype(%struct.bpf_sock_ops) %skops, ...
    br label %cleanup30
  ...
  ...
  sw.epilog:                                        ; preds = %if.end ...
    %rv.0 = phi i32 [ -1, %sw.default ], ...
    call void ... @llvm.bpf.getelementptr.and.store.i32
      (i32 %rv.0, ptr nonnull writeonly elementtype(%struct.bpf_sock_ops) %skops, ...
    br label %cleanup30
  
  cleanup30:                                        ; preds = %sw.epilog, %if.then
    %retval.0 = phi i32 [ 0, %if.then ], [ 1, %sw.epilog ], !dbg !228
    ret i32 %retval.0, !dbg !279

Note that in `if.then` the instruction is `tail call` while for `sw.epilog` the instruction is `call`. These instructions are considered not to be the same by `SimplifyCFG.cpp:canSinkInstructions`. I have an old merge request that fixes this: https://reviews.llvm.org/D134743 . Need to rebase it and ping someone to review.


This commit adds a new BPF specific structure attribte
`__attribute__((btf_decl_tag("ctx")))` and a pass to deal with it.

This attribute may be attached to a struct or union declaration, where
it notifies the compiler that this structure is a "context" structure.
The following limitations apply to context structures:

- runtime environment might patch access to the fields of this type by updating the field offset;

  BPF verifier limits access patterns allowed for certain data types. E.g. `struct __sk_buff` and `struct bpf_sock_ops`. For these types only `LD/ST <reg> <static-offset>` memory loads and stores are allowed.

  This is so because offsets of the fields of these structures do not match real offsets in the running kernel. During BPF program load/verification loads and stores to the fields of these types are rewritten so that offsets match real offsets. For this rewrite to happen static offsets have to be encoded in the instructions.

  See `kernel/bpf/verifier.c:convert_ctx_access` function in the Linux kernel source tree for details.

- runtime environment might disallow access to the field of the type through modified pointers.

  During BPF program verification a tag `PTR_TO_CTX` is tracked for register values. In case if register with such tag is modified BPF programs are not allowed to read or write memory using register. See kernel/bpf/verifier.c:check_mem_access function in the Linux kernel source tree for details.

Access to the structure fields is translated to IR as a sequence:

- `(load (getelementptr %ptr %offset))` or
- `(store (getelementptr %ptr %offset))`

During instruction selection phase such sequences are translated as a
single load instruction with embedded offset, e.g. `LDW %ptr, %offset`,
which matches access pattern necessary for the restricted
set of types described above (when `%offset` is static).

Multiple optimizer passes might separate these instructions, this
includes:

- SimplifyCFGPass (sinking)
- InstCombine (sinking)
- GVN (hoisting)

The `btf_decl_tag("ctx")` attribute marks structures for which the
following transformations happen:

- at the early IR processing stage:
  - `(load (getelementptr ...))` replaced by call to intrinsic `llvm.bpf.getelementptr.and.load`;
  - `(store (getelementptr ...))` replaced by call to intrinsic `llvm.bpf.getelementptr.and.store`;
- at the late IR processing stage this modification is undone.

Such handling prevents various optimizer passes from generating
sequences of instructions that would be rejected by BPF verifier.

The __attribute__((btf_decl_tag("ctx"))) has a priority over
__attribute__((preserve_access_index)). When "ctx" attribute is
present preserve access index transformations are not applied.

To ensure this, handling of the BTFDeclTagAttr in SemaDeclAttr.cpp is
updated to recursively propagate "ctx" attribute to nested structure
declarations as it is done for BPFPreserveAccessIndexAttr.

This addresses the issue reported by the following thread:

https://lore.kernel.org/bpf/CAA-VZPmxh8o8EBcJ=m-DH4ytcxDFmo0JKsm1p1gf40kS0CE3NQ@mail.gmail.com/T/#m4b9ce2ce73b34f34172328f975235fc6f19841b6


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D133361

Files:
  clang/lib/CodeGen/CGExpr.cpp
  clang/lib/Sema/SemaDeclAttr.cpp
  clang/test/CodeGen/btf-decl-tag-ctx-arr.c
  clang/test/CodeGen/btf-decl-tag-ctx-bitfield.c
  clang/test/CodeGen/btf-decl-tag-ctx-lvalue.c
  clang/test/CodeGen/btf-decl-tag-ctx-non-bpf.c
  clang/test/CodeGen/btf-decl-tag-ctx-pai.c
  clang/test/Sema/btf-decl-tag-ctx-ast.c
  llvm/include/llvm/IR/Intrinsics.td
  llvm/include/llvm/IR/IntrinsicsBPF.td
  llvm/lib/Target/BPF/BPF.h
  llvm/lib/Target/BPF/BPFCheckAndAdjustIR.cpp
  llvm/lib/Target/BPF/BPFContextMarker.cpp
  llvm/lib/Target/BPF/BPFTargetMachine.cpp
  llvm/lib/Target/BPF/CMakeLists.txt
  llvm/test/CodeGen/BPF/decl-tag-ctx/load-align.ll
  llvm/test/CodeGen/BPF/decl-tag-ctx/load-atomic.ll
  llvm/test/CodeGen/BPF/decl-tag-ctx/load-chain-2.ll
  llvm/test/CodeGen/BPF/decl-tag-ctx/load-chain-oob.ll
  llvm/test/CodeGen/BPF/decl-tag-ctx/load-chain-u8-oob.ll
  llvm/test/CodeGen/BPF/decl-tag-ctx/load-chain-u8-type-mismatch.ll
  llvm/test/CodeGen/BPF/decl-tag-ctx/load-chain-u8.ll
  llvm/test/CodeGen/BPF/decl-tag-ctx/load-chain.ll
  llvm/test/CodeGen/BPF/decl-tag-ctx/load-inline.ll
  llvm/test/CodeGen/BPF/decl-tag-ctx/load-non-const.ll
  llvm/test/CodeGen/BPF/decl-tag-ctx/load-simple.ll
  llvm/test/CodeGen/BPF/decl-tag-ctx/load-undo-align.ll
  llvm/test/CodeGen/BPF/decl-tag-ctx/load-undo-chain-oob.ll
  llvm/test/CodeGen/BPF/decl-tag-ctx/load-undo-chain-u8.ll
  llvm/test/CodeGen/BPF/decl-tag-ctx/load-undo-chain.ll
  llvm/test/CodeGen/BPF/decl-tag-ctx/load-undo-simple.ll
  llvm/test/CodeGen/BPF/decl-tag-ctx/load-undo-volatile.ll
  llvm/test/CodeGen/BPF/decl-tag-ctx/load-unroll-inline.ll
  llvm/test/CodeGen/BPF/decl-tag-ctx/load-unroll.ll
  llvm/test/CodeGen/BPF/decl-tag-ctx/load-volatile.ll
  llvm/test/CodeGen/BPF/decl-tag-ctx/load-zero.ll
  llvm/test/CodeGen/BPF/decl-tag-ctx/store-align.ll
  llvm/test/CodeGen/BPF/decl-tag-ctx/store-atomic.ll
  llvm/test/CodeGen/BPF/decl-tag-ctx/store-chain-2.ll
  llvm/test/CodeGen/BPF/decl-tag-ctx/store-chain-oob.ll
  llvm/test/CodeGen/BPF/decl-tag-ctx/store-chain-u8-oob.ll
  llvm/test/CodeGen/BPF/decl-tag-ctx/store-chain-u8.ll
  llvm/test/CodeGen/BPF/decl-tag-ctx/store-chain.ll
  llvm/test/CodeGen/BPF/decl-tag-ctx/store-simple.ll
  llvm/test/CodeGen/BPF/decl-tag-ctx/store-undo-align.ll
  llvm/test/CodeGen/BPF/decl-tag-ctx/store-undo-chain-oob.ll
  llvm/test/CodeGen/BPF/decl-tag-ctx/store-undo-chain-u8.ll
  llvm/test/CodeGen/BPF/decl-tag-ctx/store-undo-chain.ll
  llvm/test/CodeGen/BPF/decl-tag-ctx/store-undo-simple.ll
  llvm/test/CodeGen/BPF/decl-tag-ctx/store-undo-volatile.ll
  llvm/test/CodeGen/BPF/decl-tag-ctx/store-unroll-inline.ll
  llvm/test/CodeGen/BPF/decl-tag-ctx/store-volatile.ll
  llvm/test/CodeGen/BPF/decl-tag-ctx/store-zero.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D133361.536643.patch
Type: text/x-patch
Size: 148553 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20230703/cb186ae9/attachment-0001.bin>


More information about the cfe-commits mailing list