[PATCH] D133361: [BPF] Attribute btf_decl_tag("ctx") for structs

Eduard Zingerman via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Sep 4 04:55:07 PDT 2023


eddyz87 added a comment.

After retesting kernel build with LLVM=1 and libbpf patch <https://github.com/eddyz87/bpf/tree/llvm-d133361-ctx> to reconstruct btf_decl_tag [1], statistics for BPF selftests looks as follows:

- out of 653 object files 13 have some differences with and w/o this change;
- for 2 programs there is small instruction count increase (+2 insn total);
- for 5 programs there is small instruction decrease (-6 insn total);
- 6 programs differ slightly but number of instructions is the same.

(The differences are insignificant, the rest of the comment could be skipped as it is probably not interesting for anyone but me).

---

Differences for first 5 programs were already described in this <https://reviews.llvm.org/D133361#4467348> comment. The rest of the differences in described below.

netns_cookie_prog.bpf.o
-----------------------

Without ctx: 46 instructions
With ctx: 46 instructions

Instruction reordering:

   <get_netns_cookie_sk_msg>:
   	r6 = r1
  -	r2 = *(u64 *)(r6 + 0x48)
   	r1 = *(u32 *)(r6 + 0x10)
  -	if w1 != 0xa goto +0xb <LBB1_4>
  +	if w1 != 0xa goto +0xc <LBB1_4>
  +	r2 = *(u64 *)(r6 + 0x48)
   	if r2 == 0x0 goto +0xa <LBB1_4>
   	r1 = 0x0 ll
   	r3 = 0x0

The difference is introduced by "Machine code sinking" transformation. Before the transformation both 0x48 and 0x10 loads reside in the same basic block:

  ;; Old:
  bb.0.entry:
    ...
    %0:gpr = CORE_LD64 345, %2:gpr, @"llvm.sk_msg_md:0:72$0:10:0"
    %9:gpr32 = CORE_LD32 350, %2:gpr, @"llvm.sk_msg_md:0:16$0:2"
    JNE_ri_32 killed %9:gpr32, 10, %bb.3
  
  ;; New:
  bb.0.entry:
    ...
    %0:gpr = LDD %2:gpr, 72
    %3:gpr32 = LDW32 %2:gpr, 16
    JNE_ri_32 killed %3:gpr32, 10, %bb.3

Note: CORE pseudo-instructions are replaced by regular loads because btf_decl_tag("ctx") has priority over preserve_access_index attribute. The "Machine code sinking" transformation (MachineSink.cpp) can move `LDD`, `LDW` instructions, but can't move `CORE_LD*` because `CORE_LD*` instructions are marked as `MCID::UnmodeledSideEffects` in `BPFGenInstrInfo.inc` (maybe something to adjust):

  // called from MachineSinking::SinkInstruction
  bool MachineInstr::isSafeToMove(AAResults *AA, bool &SawStore) const {
    if (... hasUnmodeledSideEffects())
      return false;
    ...
  }



sock_destroy_prog.bpf.o
-----------------------

Without ctx: 102 instructions
With ctx: 101 instructions

In the following code fragment:

  	if (ctx->protocol == IPPROTO_TCP)
  		bpf_map_update_elem(&tcp_conn_sockets, &key, &sock_cookie, 0);
  	else if (ctx->protocol == IPPROTO_UDP)
  		bpf_map_update_elem(&udp_conn_sockets, &keyc, &sock_cookie, 0);
  	else
  		return 1;

Version w/o btf_decl_tag("ctx") keeps two loads for `ctx->protocol` because of the llvm.bpf.passthrough call. Version with btf_decl_tag("ctx") eliminates second load via a combination of EarlyCSEPass/InstCombinePass/SimplifyCFGPass passes.

socket_cookie_prog.bpf.o
------------------------

Without ctx: 66 instructions
With ctx: 66 instructions

For the following C code fragment:

  SEC("sockops")
  int update_cookie_sockops(struct bpf_sock_ops *ctx)
  {
  	struct bpf_sock *sk = ctx->sk;
  	struct socket_cookie *p;
  
  	if (ctx->family != AF_INET6)
  		return 1;
  
  	if (ctx->op != BPF_SOCK_OPS_TCP_CONNECT_CB)
  		return 1;
  
  	if (!sk)
  		return 1;
      ...
  }

Code with decl_tag("ctx") does reordering for ctx->sk load relative to ctx->family and ctx->op loads:

  //  old                                       new
   <update_cookie_sockops>:                  <update_cookie_sockops>:
      r6 = r1                                   r6 = r1
  -   r2 = *(u64 *)(r6 + 0xb8)
      r1 = *(u32 *)(r6 + 0x14)                  r1 = *(u32 *)(r6 + 0x14)
      if w1 != 0xa goto +0x13 <LBB1_6>          if w1 != 0xa goto +0x14 <LBB1_6>
      r1 = *(u32 *)(r6 + 0x0)                   r1 = *(u32 *)(r6 + 0x0)
      if w1 != 0x3 goto +0x11 <LBB1_6>          if w1 != 0x3 goto +0x12 <LBB1_6>
                                            +   r2 = *(u64 *)(r6 + 0xb8)
      if r2 == 0x0 goto +0x10 <LBB1_6>          if r2 == 0x0 goto +0x10 <LBB1_6>
      r1 = 0x0 ll                               r1 = 0x0 ll
      r3 = 0x0                                  r3 = 0x0

Code w/o decl_tag("ctx") uses `CORE_LD*` instructions for these loads and does not reorder loads due to reasons as in netns_cookie_prog.bpf.o.

test_lwt_reroute.bpf.o
----------------------

Without ctx: 18 instructions
With ctx: 17 instructions

The difference boils down EarlyCSEPass being able to remove last load in the store/load pair:

  llvm
  ; Before EarlyCSEPass
    store i32 %and, ptr %mark24, align 8
    %mark25 = getelementptr inbounds %struct.__sk_buff, ptr %skb, i32 0, i32 2
    %19 = load i32, ptr %mark25, align 8
    %cmp26 = icmp eq i32 %19, 0
  ; After EarlyCSEPass
    %and = and i32 %cond, 255
    store i32 %and, ptr %mark, align 8
    %cmp26 = icmp eq i32 %and, 0

And unable to do so when get.element.and.{store,load} intrinsics are used. Which leads to slight codegen differences downstream.

test_sockmap_invalid_update.bpf.o
---------------------------------

Without ctx: 13 instructions
With ctx: 12 instructions

In the following C fragment:

  c
  	if (skops->sk)
  		bpf_map_update_elem(&map, &key, skops->sk, 0);

Code with decl_tag("ctx") loads skops->sk only once. Code w/o decl_tag("ctx") uses CO-RE relocations and does load twice. As with sock_destroy_prog.bpf.o, EarlyCSEPass does not consolidate identical `%x = call llvm.bpf.passthrough; load %x` pairs.

type_cast.bpf.o
---------------

Without ctx: 96 instructions
With ctx: 96 instructions

`__builtin_memcpy(name, dev->name, IFNAMSIZ)` is unrolled in a different order. No idea why.

core_kern.bpf.o, test_verif_scale2.bpf.o
----------------------------------------

For both programs number of instructions is unchanged (11249, 12286). Some instructions have different order after DAG->DAG Pattern Instruction Selection. Instruction selection with CO-RE and non-CO-RE loads produces slightly different result.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D133361/new/

https://reviews.llvm.org/D133361



More information about the cfe-commits mailing list