[PATCH] D133361: [BPF] Attribute btf_decl_tag("ctx") for structs
Eduard Zingerman via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Sun Jul 2 18:20:52 PDT 2023
eddyz87 created this revision.
Herald added subscribers: hiraditya, mgorny.
Herald added a reviewer: aaron.ballman.
Herald added a project: All.
eddyz87 updated this revision to Diff 459560.
eddyz87 added a comment.
eddyz87 updated this revision to Diff 460356.
eddyz87 updated this revision to Diff 461103.
eddyz87 updated this revision to Diff 461317.
eddyz87 updated this revision to Diff 462038.
eddyz87 updated this revision to Diff 462709.
eddyz87 updated this revision to Diff 534561.
Herald added a subscriber: zzheng.
eddyz87 updated this revision to Diff 535210.
eddyz87 updated this revision to Diff 535380.
eddyz87 updated this revision to Diff 536044.
eddyz87 updated this revision to Diff 536643.
eddyz87 published this revision for review.
eddyz87 edited reviewers, added: yonghong-song; removed: aaron.ballman.
Herald added subscribers: llvm-commits, cfe-commits, jdoerfert.
Herald added a reviewer: aaron.ballman.
Herald added projects: clang, LLVM.
- Added folding for GEP chains with constant indices:
- inline anonymous unions work as expected
- type casts to char* with consequitive array access work as expected
- Moved context rewrite to a late callback function processing pipeline stage:
- necessary to run the rewrite after loop unrolling, as rewrite can't handle non-constant indices for non structural GEP chains
eddyz87 added a comment.
- added GVN pass after context rewrite to handle cases clobbered by marker calls
- added marker simplification and context rewrite passes for `opt` pipeline
- more test cases
eddyz87 added a comment.
Merged simplify and rewrite passes, various adjustments to unclobber CSE and GVN opportunities
eddyz87 added a comment.
Merge getelementptr and get.and.load/store calls for function inlining
eddyz87 added a comment.
Force doesNotAccessMemory for context marker function
eddyz87 added a comment.
Moved readonly/writeonly attrs as properties of call site to handle volatile load and stores
eddyz87 added a comment.
Rebase, replaced .c based back-end changes by .ll based ones.
eddyz87 added a comment.
- renamed llvm.bpf.context.marker to llvm.context.marker.bpf (to avoid issues with intrinsic being looked up in the wrong table and not getting an ID / Attrs for CallInst);
- use new pass manager for tests.
eddyz87 added a comment.
- removed lots of uses of `auto`
- removed usage of std::set
eddyz87 added a comment.
Rebase.
eddyz87 added a comment.
Generate immarg for index arguments of gep.and.load, gep.and.store.
eddyz87 added a comment.
Hi Yonghong,
Could you please take a look at this revision? I tried to describe the mechanics in the description / commit message and comments. Below are details regarding testing.
I tested these changes using a Linux Kernel special branch that declares decl_tag("ctx") for relevant functions (available here <https://github.com/eddyz87/bpf/tree/decl-tag-ctx>).
While comparing object files generated without decl_tag("ctx") and with decl_tag("ctx") I found some differences in 5 tests. (I use "with ctx" and "without ctx" below as a shorthand).
bpf_flow.bpf.o
--------------
Without ctx: 2713 instructions
With ctx: 2711 instructions
Difference: 2 load instructions appear in different BB and this requires two additional register to register assignments in "without ctx" case.
Difference is caused by `GVNPass::PerformLoadPRE` sub-pass and disappears if `-mllvm -enable-load-pre=false` option is passed. As the name implies this sub-pass operates only on `load` instructions.
connect4_prog.bpf.o
-------------------
Without ctx: 351 instructions
With ctx: 351 instructions
Difference: one redundant load instruction is put in it's own basic block when compiled without ctx.
c
static __inline int set_notsent_lowat(struct bpf_sock_addr *ctx)
{ ... ctx->type ... }
int connect_v4_prog(struct bpf_sock_addr *ctx)
{
...
if (set_notsent_lowat(ctx))
return 0;
if (ctx->type ...) // ctx->type load is partially redundat after inlining
return 0;
}
In the without ctx this is done by a part of `JumpThreadingPass`:
c++
/// simplifyPartiallyRedundantLoad - If LoadI is an obviously partially
/// redundant load instruction, eliminate it by replacing it with a PHI node.
/// This is an important optimization that encourages jump threading, and needs
/// to be run interlaced with other jump threading tasks.
bool JumpThreadingPass::simplifyPartiallyRedundantLoad(LoadInst *LoadI)
test_cls_redirect.bpf.o
-----------------------
Without ctx: 1616 instructions
With ctx: 1615 instructions
Difference: some differences in basic blocks placement. IR basic block structure is identical up until 'Branch Probability Basic Block Placement' machine pass.
Before that in without ctx case there is one additional basic block, which is introduced by GVNPass on `process_icmpv4` function. Differences caused by `GVNPass::PerformLoadPRE` sub-pass which calls
`llvm::SplitCriticalEdge` that introduces the basic block.
test_misc_tcp_hdr_options.bpf.o
-------------------------------
Without ctx: 611 instructions
With ctx: 612 instructions
Difference: one more load instruction generated in with ctx case. Happens in the following code snippet:
c
unsigned int nr_data = 0;
...
static int check_active_hdr_in(struct bpf_sock_ops *skops)
{
...
if (... < skops->skb_len) // (1)
nr_data++;
...
if (... == skops->skb_len) // (2)
nr_pure_ack++;
...
}
When compiled without ctx `EarlyCSEPass` reuses `skops->skb_len` computed at (1) for point (2). However, when compiled with ctx this does not happen. This happens because clang is not sure if `skops`, passed as parameter to `getelementptr.and.load` shares memory with `nr_data`. The following modification removes all differences:
c
static int check_active_hdr_in(struct bpf_sock_ops * restrict skops)
test_tcpnotify_kern.bpf.o
-------------------------
Without ctx: 57 instructions
With ctx: 58 instructions
Difference: one more store instruction generated, CFG structure differs slightly.
Difference is caused by `SimplifyCFGPass`, difference disappears if `-mllvm -sink-common-insts=false` option is supplied. The following IR snippet is the cause:
llvm
...
if.then: ; preds = %entry
%1 = getelementptr inbounds %struct.bpf_sock_ops, ptr %skops, i64 0, i32 1
store i32 -1, ptr %1, align 4, !tbaa !240
br label %cleanup30
...
...
sw.epilog: ; preds = %if.end, ...
%rv.0 = phi i32 [ -1, %sw.default ], ...
%7 = getelementptr inbounds %struct.bpf_sock_ops, ptr %skops, i64 0, i32 1
store i32 %rv.0, ptr %7, align 4, !tbaa !240
br label %cleanup30, !dbg !278
cleanup30: ; preds = %sw.epilog, %if.then
%retval.0 = phi i32 [ 0, %if.then ], [ 1, %sw.epilog ], !dbg !226
ret i32 %retval.0
Without ctx `SimplifyCFGPass` sinks `getelementptr` and `store` instructions to `cleanup30` BB:
llvm
cleanup30: ; preds = %sw.bb19, ...
%rv.0.sink = phi i32 [ -1, %entry ], ...
%retval.0 = phi i32 [ 0, %entry ], ...
%6 = getelementptr inbounds %struct.bpf_sock_ops, ptr %skops, i64 0, i32 1, !dbg !226
store i32 %rv.0.sink, ptr %6, align 4, !dbg !226, !tbaa !270
ret i32 %retval.0, !dbg !271
For ctx case snippet looks as follows:
llvm
if.then: ; preds = %entry
tail call void ... @llvm.bpf.getelementptr.and.store.i32
(i32 -1, ptr nonnull writeonly elementtype(%struct.bpf_sock_ops) %skops, ...
br label %cleanup30
...
...
sw.epilog: ; preds = %if.end ...
%rv.0 = phi i32 [ -1, %sw.default ], ...
call void ... @llvm.bpf.getelementptr.and.store.i32
(i32 %rv.0, ptr nonnull writeonly elementtype(%struct.bpf_sock_ops) %skops, ...
br label %cleanup30
cleanup30: ; preds = %sw.epilog, %if.then
%retval.0 = phi i32 [ 0, %if.then ], [ 1, %sw.epilog ], !dbg !228
ret i32 %retval.0, !dbg !279
Note that in `if.then` the instruction is `tail call` while for `sw.epilog` the instruction is `call`. These instructions are considered not to be the same by `SimplifyCFG.cpp:canSinkInstructions`. I have an old merge request that fixes this: https://reviews.llvm.org/D134743 . Need to rebase it and ping someone to review.
This commit adds a new BPF specific structure attribte
`__attribute__((btf_decl_tag("ctx")))` and a pass to deal with it.
This attribute may be attached to a struct or union declaration, where
it notifies the compiler that this structure is a "context" structure.
The following limitations apply to context structures:
- runtime environment might patch access to the fields of this type by updating the field offset;
BPF verifier limits access patterns allowed for certain data types. E.g. `struct __sk_buff` and `struct bpf_sock_ops`. For these types only `LD/ST <reg> <static-offset>` memory loads and stores are allowed.
This is so because offsets of the fields of these structures do not match real offsets in the running kernel. During BPF program load/verification loads and stores to the fields of these types are rewritten so that offsets match real offsets. For this rewrite to happen static offsets have to be encoded in the instructions.
See `kernel/bpf/verifier.c:convert_ctx_access` function in the Linux kernel source tree for details.
- runtime environment might disallow access to the field of the type through modified pointers.
During BPF program verification a tag `PTR_TO_CTX` is tracked for register values. In case if register with such tag is modified BPF programs are not allowed to read or write memory using register. See kernel/bpf/verifier.c:check_mem_access function in the Linux kernel source tree for details.
Access to the structure fields is translated to IR as a sequence:
- `(load (getelementptr %ptr %offset))` or
- `(store (getelementptr %ptr %offset))`
During instruction selection phase such sequences are translated as a
single load instruction with embedded offset, e.g. `LDW %ptr, %offset`,
which matches access pattern necessary for the restricted
set of types described above (when `%offset` is static).
Multiple optimizer passes might separate these instructions, this
includes:
- SimplifyCFGPass (sinking)
- InstCombine (sinking)
- GVN (hoisting)
The `btf_decl_tag("ctx")` attribute marks structures for which the
following transformations happen:
- at the early IR processing stage:
- `(load (getelementptr ...))` replaced by call to intrinsic `llvm.bpf.getelementptr.and.load`;
- `(store (getelementptr ...))` replaced by call to intrinsic `llvm.bpf.getelementptr.and.store`;
- at the late IR processing stage this modification is undone.
Such handling prevents various optimizer passes from generating
sequences of instructions that would be rejected by BPF verifier.
The __attribute__((btf_decl_tag("ctx"))) has a priority over
__attribute__((preserve_access_index)). When "ctx" attribute is
present preserve access index transformations are not applied.
To ensure this, handling of the BTFDeclTagAttr in SemaDeclAttr.cpp is
updated to recursively propagate "ctx" attribute to nested structure
declarations as it is done for BPFPreserveAccessIndexAttr.
This addresses the issue reported by the following thread:
https://lore.kernel.org/bpf/CAA-VZPmxh8o8EBcJ=m-DH4ytcxDFmo0JKsm1p1gf40kS0CE3NQ@mail.gmail.com/T/#m4b9ce2ce73b34f34172328f975235fc6f19841b6
Repository:
rG LLVM Github Monorepo
https://reviews.llvm.org/D133361
Files:
clang/lib/CodeGen/CGExpr.cpp
clang/lib/Sema/SemaDeclAttr.cpp
clang/test/CodeGen/btf-decl-tag-ctx-arr.c
clang/test/CodeGen/btf-decl-tag-ctx-bitfield.c
clang/test/CodeGen/btf-decl-tag-ctx-lvalue.c
clang/test/CodeGen/btf-decl-tag-ctx-non-bpf.c
clang/test/CodeGen/btf-decl-tag-ctx-pai.c
clang/test/Sema/btf-decl-tag-ctx-ast.c
llvm/include/llvm/IR/Intrinsics.td
llvm/include/llvm/IR/IntrinsicsBPF.td
llvm/lib/Target/BPF/BPF.h
llvm/lib/Target/BPF/BPFCheckAndAdjustIR.cpp
llvm/lib/Target/BPF/BPFContextMarker.cpp
llvm/lib/Target/BPF/BPFTargetMachine.cpp
llvm/lib/Target/BPF/CMakeLists.txt
llvm/test/CodeGen/BPF/decl-tag-ctx/load-align.ll
llvm/test/CodeGen/BPF/decl-tag-ctx/load-atomic.ll
llvm/test/CodeGen/BPF/decl-tag-ctx/load-chain-2.ll
llvm/test/CodeGen/BPF/decl-tag-ctx/load-chain-oob.ll
llvm/test/CodeGen/BPF/decl-tag-ctx/load-chain-u8-oob.ll
llvm/test/CodeGen/BPF/decl-tag-ctx/load-chain-u8-type-mismatch.ll
llvm/test/CodeGen/BPF/decl-tag-ctx/load-chain-u8.ll
llvm/test/CodeGen/BPF/decl-tag-ctx/load-chain.ll
llvm/test/CodeGen/BPF/decl-tag-ctx/load-inline.ll
llvm/test/CodeGen/BPF/decl-tag-ctx/load-non-const.ll
llvm/test/CodeGen/BPF/decl-tag-ctx/load-simple.ll
llvm/test/CodeGen/BPF/decl-tag-ctx/load-undo-align.ll
llvm/test/CodeGen/BPF/decl-tag-ctx/load-undo-chain-oob.ll
llvm/test/CodeGen/BPF/decl-tag-ctx/load-undo-chain-u8.ll
llvm/test/CodeGen/BPF/decl-tag-ctx/load-undo-chain.ll
llvm/test/CodeGen/BPF/decl-tag-ctx/load-undo-simple.ll
llvm/test/CodeGen/BPF/decl-tag-ctx/load-undo-volatile.ll
llvm/test/CodeGen/BPF/decl-tag-ctx/load-unroll-inline.ll
llvm/test/CodeGen/BPF/decl-tag-ctx/load-unroll.ll
llvm/test/CodeGen/BPF/decl-tag-ctx/load-volatile.ll
llvm/test/CodeGen/BPF/decl-tag-ctx/load-zero.ll
llvm/test/CodeGen/BPF/decl-tag-ctx/store-align.ll
llvm/test/CodeGen/BPF/decl-tag-ctx/store-atomic.ll
llvm/test/CodeGen/BPF/decl-tag-ctx/store-chain-2.ll
llvm/test/CodeGen/BPF/decl-tag-ctx/store-chain-oob.ll
llvm/test/CodeGen/BPF/decl-tag-ctx/store-chain-u8-oob.ll
llvm/test/CodeGen/BPF/decl-tag-ctx/store-chain-u8.ll
llvm/test/CodeGen/BPF/decl-tag-ctx/store-chain.ll
llvm/test/CodeGen/BPF/decl-tag-ctx/store-simple.ll
llvm/test/CodeGen/BPF/decl-tag-ctx/store-undo-align.ll
llvm/test/CodeGen/BPF/decl-tag-ctx/store-undo-chain-oob.ll
llvm/test/CodeGen/BPF/decl-tag-ctx/store-undo-chain-u8.ll
llvm/test/CodeGen/BPF/decl-tag-ctx/store-undo-chain.ll
llvm/test/CodeGen/BPF/decl-tag-ctx/store-undo-simple.ll
llvm/test/CodeGen/BPF/decl-tag-ctx/store-undo-volatile.ll
llvm/test/CodeGen/BPF/decl-tag-ctx/store-unroll-inline.ll
llvm/test/CodeGen/BPF/decl-tag-ctx/store-volatile.ll
llvm/test/CodeGen/BPF/decl-tag-ctx/store-zero.ll
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D133361.536643.patch
Type: text/x-patch
Size: 148553 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20230703/cb186ae9/attachment-0001.bin>
More information about the cfe-commits
mailing list