[cfe-dev] BPF: adding new clang extension bpf_dominating_decl attribute

Y Song via cfe-dev cfe-dev at lists.llvm.org
Wed Jan 5 12:26:06 PST 2022


On Mon, Jan 3, 2022 at 12:52 PM Aaron Ballman <aaron at aaronballman.com> wrote:
>
> On Mon, Dec 20, 2021 at 7:06 PM Y Song <ys114321 at gmail.com> wrote:
> >
> > This is a request to add a clang extention, more specificly,
> > a clang attribute named bpf_dominating_decl. This clang
> > extention is intended to be used for bpf target only. Below
> > I will explain in detail about this proposed attribute, why
> > bpf community needs this, how it will be used and other
> > aspects as described in https://clang.llvm.org/get_involved.html.
>
> Thank you for this RFC!

You are welcome!

>
> > Evidence of a significant user community
> > ========================================
> >
> > We are proposing a new clang attribute bpf_dominating_decl which
> > was implemented in [1]. The feature has also been discussed in
> > cfe-dev mailing list ([2]). It intended to solve the
> > following use case:
> >   - A tool generated vmlinux.h is used for CO-RE (compile once,
> >     run everywhere) use cases.
> >   - vmlinux.h contains all kernel data structures for a particular config,
> >     see [3] and [4] about how it is generated and why it is important.
> >   - but vmlinux.h may have type conflicts with other headers
> >     user intends to use.
> >
> > Macros are such an example. Currently CO-RE relocation cannot
> > handle macros and macros may be defined in some header files accessible
> > to the user. If those header files have type conflict with vmlinux.h,
> > users are forced to copy macro definitions. The same for some simple
> > static inline functions defined in header files. This issue has been
> > discussed before and that is why we proposed this issue. And just last
> > week, it is discussed/complained again ([5]) for not able to use
> > some non-kernel types with a header file which has some type conflicts
> > with vmlinux.h.
> >
> > If it is accepted, the attribute will be used inside the vmlinux.h and
> > it will be used by virtually all bpf developers and it will make bpf devlopers
> > more productive by not copying macros, static inline functions or
> > non-kernel types.
>
> I'm uncomfortable with this attribute. Typically, attributes extend
> rather than redefine the language. e.g., you might add attributes for
> better performance or diagnostic characteristics, but you typically
> should not use an attribute to redefine the basic premises of the
> language.
>
> In this particular case, the attribute is used to tell the compiler to
> ignore type redefinition errors and instead pick a "dominating"
> declaration for the type. While C isn't as type sensitive as C++ is,
> it still has _Generic, __typeof__, and other tricks that can expose
> type system shenanigans like this in surprising ways. Given that type
> size information is critical for many things in C (memcpy, memcmp,
> pointer arithmetic with offsetof, etc), I'm uncomfortable with the
> security aspects of the likely type confusion stemming from this being
> so novel in C.

To limit the potential impact. As RFC suggested, we can limit the
impact only for CO-RE relocatable types. bpf developers are already
aware and know how to use properly builtin's for CO-RE relocatable
types and the types we are targeting are also CO-RE relocatable types.

For example, type size, in
https://github.com/torvalds/linux/blob/master/tools/lib/bpf/bpf_core_read.h
we have the following macro:

#define bpf_core_type_size(type)                                            \
        __builtin_preserve_type_info(*(typeof(type) *)0, BPF_TYPE_SIZE)

So users can get the type size for a particular kernel. Note that the type
might have different sizes for different kernels.

For offsetof issue, the bpf_core_read.h provides the following macro:

#define BPF_CORE_READ(src, a, ...) ({                                       \
        ___type((src), a, ##__VA_ARGS__) __r;                               \
        BPF_CORE_READ_INTO(&__r, (src), a, ##__VA_ARGS__);                  \
        __r;                                                                \
})

which eventually uses the builtin __builtin_preserve_access_index()
so bpfloader can adjust the offsetof properly.

So for relocatable types, user won't use typeof or offsetof.
Otherwise, programs won't be portable even without bpf_dominating_decl
attribute.

>
> That said, we do have *one* attribute that I consider to be a
> "redefine the language in fundamental ways" feature --
> [[clang::overloadable]] allows you to define overload sets in C, which
> is a distinctly not-C thing to do because of the name mangling
> involved. However, that attribute introduces the C++ semantics into C
> whereas the BPF dominating declaration attribute is introducing wholly
> novel semantics. So I don't really consider [[clang::overloadable]] as
> direct precedent for this.
>
> >
> > A specific need to reside within the Clang tree
> > ===============================================
> >
> > The proposed attribute will be processed by Clang frontend lex and
> > sema and it would be
> > best to reside within the Clang tree.
>
> Would it be plausible/appropriate to instead run the source code
> through a processing tool which emits modified source code with the
> correct definitions instead of hoping this dominating declaration
> works out? In this case, I think the user will get better diagnostic

Theoretically it is possible. We need to have a preprocessor, parse
the program, including all
include files, do exactly the https://reviews.llvm.org/D111307 has
done to ignore
those duplicated relocatable types and generate a .i file and feed into clang.
But this duplicates a lot of current clang code and the tool itself cannot
automatically benefit from future clang improvements. So I think in-tree
support is the best option, least maintenance burden.

> behavior in the places where there *is* type confusion the compiler
> can detect, but the user will at least have an easier time *debugging*
> any problems from this.
>
> Which brings up an interesting question -- how does this attribute
> interact with debug information?

I tried a simple example. The debuginfo does not contain *ignored* types.
And this is exactly what we want.

[yhs at devbig309.ftw3 ~/work/tests/llvm/dom] cat t.h
#pragma GCC system_header

#pragma clang attribute push(__attribute__((bpf_dominating_decl)), \
                             apply_to = any(record, enum, type_alias))
struct bob {
  int a;
  union {
    int i;
  };
};
#pragma clang attribute pop
struct bob {
  int a;
};
$ cat t.c
#include "t.h"

int foo(struct bob *arg) {
  return arg->a;
}
$ clang -target bpf -O2 -g -c t.c
$ llvm-dwarfdump t.o
t.o:    file format elf64-bpf

.debug_info contents:
0x00000000: Compile Unit: length = 0x00000088, format = DWARF32,
version = 0x0004, abbr_offset = 0x0000, addr_size = 0x08 (next unit at
0x0000008c)

0x0000000b: DW_TAG_compile_unit
              DW_AT_producer    ("clang version 14.0.0
(https://github.com/llvm/llvm-project.git
81703ebc2a7673de5b6cd71f8a6526beb1aee4ae)")
              DW_AT_language    (DW_LANG_C99)
              DW_AT_name        ("t.c")
              DW_AT_stmt_list   (0x00000000)
              DW_AT_comp_dir    ("/home/yhs/work/tests/llvm/dom")
              DW_AT_low_pc      (0x0000000000000000)
              DW_AT_high_pc     (0x0000000000000010)

0x0000002a:   DW_TAG_subprogram
                DW_AT_low_pc    (0x0000000000000000)
                DW_AT_high_pc   (0x0000000000000010)
                DW_AT_frame_base        (DW_OP_reg10 W10)
                DW_AT_GNU_all_call_sites        (true)
                DW_AT_name      ("foo")
                DW_AT_decl_file ("/home/yhs/work/tests/llvm/dom/t.c")
                DW_AT_decl_line (3)
                DW_AT_prototyped        (true)
                DW_AT_type      (0x00000051 "int")
                DW_AT_external  (true)

0x00000043:     DW_TAG_formal_parameter
                  DW_AT_location        (DW_OP_reg1 W1)
                  DW_AT_name    ("arg")
                  DW_AT_decl_file       ("/home/yhs/work/tests/llvm/dom/t.c")
                  DW_AT_decl_line       (3)
                  DW_AT_type    (0x00000058 "bob *")

0x00000050:     NULL

0x00000051:   DW_TAG_base_type
                DW_AT_name      ("int")
                DW_AT_encoding  (DW_ATE_signed)
                DW_AT_byte_size (0x04)

0x00000058:   DW_TAG_pointer_type
                DW_AT_type      (0x0000005d "bob")

0x0000005d:   DW_TAG_structure_type
                DW_AT_name      ("bob")
                DW_AT_byte_size (0x08)
                DW_AT_decl_file ("/home/yhs/work/tests/llvm/dom/./t.h")
                DW_AT_decl_line (5)

0x00000065:     DW_TAG_member
                  DW_AT_name    ("a")
                  DW_AT_type    (0x00000051 "int")
                  DW_AT_decl_file       ("/home/yhs/work/tests/llvm/dom/./t.h")
                  DW_AT_decl_line       (6)
                  DW_AT_data_member_location    (0x00)

0x00000071:     DW_TAG_member
                  DW_AT_type    (0x00000079 "bob::union ")
                  DW_AT_decl_file       ("/home/yhs/work/tests/llvm/dom/./t.h")
                  DW_AT_decl_line       (7)
                  DW_AT_data_member_location    (0x04)
0x00000079:     DW_TAG_union_type
                  DW_AT_byte_size       (0x04)
                  DW_AT_decl_file       ("/home/yhs/work/tests/llvm/dom/./t.h")
                  DW_AT_decl_line       (7)

0x0000007d:       DW_TAG_member
                    DW_AT_name  ("i")
                    DW_AT_type  (0x00000051 "int")
                    DW_AT_decl_file     ("/home/yhs/work/tests/llvm/dom/./t.h")
                    DW_AT_decl_line     (8)
                    DW_AT_data_member_location  (0x00)

0x00000089:       NULL

0x0000008a:     NULL

0x0000008b:   NULL

>
> ~Aaron
>
> >
> > A specification
> > ===============
> >
> > The bpf_dominating_decl attribute, as its name suggested, is a bpf target
> > specific attribute. The attribute is also for C language only.
> >
> > The syntax:
> >    __attribute__((bpf_dominating_decl))
> > The attribute can be attached to:
> >    record, typedef and enum type declarations
> >
> > If a type declaration is annotated with bpf_dominating_decl attribute,
> > that type declaration is considered as a dominating decl and later
> > any conflict declaration, compatible or not-compatible,
> > will be silently ignored.
> >
> > The following is a simple example,
> >
> >   #pragma clang attribute push(__attribute__((bpf_dominating_decl)), \
> >                                apply_to = any(record, enum, type_alias))
> >   typedef unsigned bob;
> >   #pragma clang attribute pop
> >   typedef int bob;
> >
> > In the above example, the second typedef will not incur a compilation
> > error. The same for record types or enum values.
> >
> > Representation within the appropriate governing organization
> > ============================================================
> >
> > N/A
> >
> > A long-term support plan
> > ========================
> >
> > The feature will be supported for ever.
> >
> > A high-quality implementation
> > =============================
> >
> > The feature has been implemented in [1] and has been code reviewed.
> > One thing not in [1] but in [1]'s comment is to link
> > bpf_dominating_decl attribute to CO-RE preserve_access_index attribute
> > in clang frontend. That is, a type is considered dominating only if
> > the type has both bpf_dominating_decl and preserve_access_index attributes.
> > We can amend the implementation with this if agreed upon.
> >
> > A test suite
> > ============
> >
> > The [1] has implemented necessary tests for the feature.
> >
> > References
> > ==========
> >   [1] https://reviews.llvm.org/D111307
> >   [2] https://lists.llvm.org/pipermail/cfe-dev/2021-August/068696.html
> >   [3] https://www.grant.pizza/blog/vmlinux-header/
> >   [4] https://blog.aquasec.com/vmlinux.h-ebpf-programs
> >   [5] https://lore.kernel.org/bpf/CAEf4BzZ1K9uQ-K1Q2BCSBesR3RUj_NW8uHu6NduoX7uLBdfukQ@mail.gmail.com/


More information about the cfe-dev mailing list