[cfe-dev] BPF: adding new clang extension bpf_dominating_decl attribute

Aaron Ballman via cfe-dev cfe-dev at lists.llvm.org
Thu Jan 6 04:47:20 PST 2022


On Wed, Jan 5, 2022 at 3:31 PM Y Song <ys114321 at gmail.com> wrote:
>
> On Mon, Jan 3, 2022 at 12:52 PM Aaron Ballman <aaron at aaronballman.com> wrote:
> >
> > On Mon, Dec 20, 2021 at 7:06 PM Y Song <ys114321 at gmail.com> wrote:
> > >
> > > This is a request to add a clang extention, more specificly,
> > > a clang attribute named bpf_dominating_decl. This clang
> > > extention is intended to be used for bpf target only. Below
> > > I will explain in detail about this proposed attribute, why
> > > bpf community needs this, how it will be used and other
> > > aspects as described in https://clang.llvm.org/get_involved.html.
> >
> > Thank you for this RFC!
>
> You are welcome!
>
> >
> > > Evidence of a significant user community
> > > ========================================
> > >
> > > We are proposing a new clang attribute bpf_dominating_decl which
> > > was implemented in [1]. The feature has also been discussed in
> > > cfe-dev mailing list ([2]). It intended to solve the
> > > following use case:
> > >   - A tool generated vmlinux.h is used for CO-RE (compile once,
> > >     run everywhere) use cases.
> > >   - vmlinux.h contains all kernel data structures for a particular config,
> > >     see [3] and [4] about how it is generated and why it is important.
> > >   - but vmlinux.h may have type conflicts with other headers
> > >     user intends to use.
> > >
> > > Macros are such an example. Currently CO-RE relocation cannot
> > > handle macros and macros may be defined in some header files accessible
> > > to the user. If those header files have type conflict with vmlinux.h,
> > > users are forced to copy macro definitions. The same for some simple
> > > static inline functions defined in header files. This issue has been
> > > discussed before and that is why we proposed this issue. And just last
> > > week, it is discussed/complained again ([5]) for not able to use
> > > some non-kernel types with a header file which has some type conflicts
> > > with vmlinux.h.
> > >
> > > If it is accepted, the attribute will be used inside the vmlinux.h and
> > > it will be used by virtually all bpf developers and it will make bpf devlopers
> > > more productive by not copying macros, static inline functions or
> > > non-kernel types.
> >
> > I'm uncomfortable with this attribute. Typically, attributes extend
> > rather than redefine the language. e.g., you might add attributes for
> > better performance or diagnostic characteristics, but you typically
> > should not use an attribute to redefine the basic premises of the
> > language.
> >
> > In this particular case, the attribute is used to tell the compiler to
> > ignore type redefinition errors and instead pick a "dominating"
> > declaration for the type. While C isn't as type sensitive as C++ is,
> > it still has _Generic, __typeof__, and other tricks that can expose
> > type system shenanigans like this in surprising ways. Given that type
> > size information is critical for many things in C (memcpy, memcmp,
> > pointer arithmetic with offsetof, etc), I'm uncomfortable with the
> > security aspects of the likely type confusion stemming from this being
> > so novel in C.
>
> To limit the potential impact. As RFC suggested, we can limit the
> impact only for CO-RE relocatable types. bpf developers are already
> aware and know how to use properly builtin's for CO-RE relocatable
> types and the types we are targeting are also CO-RE relocatable types.

Limiting this to just the target and just for specific types will
certainly help, but doesn't really eliminate the fact that this
attribute is definitely not very C-like in what it does. As mentioned
on the code review, we have to do some interesting work to ensure we
emit the correct diagnostics for conformance to C (or, alternatively,
document that this target is not a C target, but that leads right back
to my argument that this is making a new language rather than
extending an existing one).

> For example, type size, in
> https://github.com/torvalds/linux/blob/master/tools/lib/bpf/bpf_core_read.h
> we have the following macro:
>
> #define bpf_core_type_size(type)                                            \
>         __builtin_preserve_type_info(*(typeof(type) *)0, BPF_TYPE_SIZE)
>
> So users can get the type size for a particular kernel. Note that the type
> might have different sizes for different kernels.
>
> For offsetof issue, the bpf_core_read.h provides the following macro:
>
> #define BPF_CORE_READ(src, a, ...) ({                                       \
>         ___type((src), a, ##__VA_ARGS__) __r;                               \
>         BPF_CORE_READ_INTO(&__r, (src), a, ##__VA_ARGS__);                  \
>         __r;                                                                \
> })
>
> which eventually uses the builtin __builtin_preserve_access_index()
> so bpfloader can adjust the offsetof properly.
>
> So for relocatable types, user won't use typeof or offsetof.

Will their use be diagnosed for BPF targets?

> Otherwise, programs won't be portable even without bpf_dominating_decl
> attribute.
>
> >
> > That said, we do have *one* attribute that I consider to be a
> > "redefine the language in fundamental ways" feature --
> > [[clang::overloadable]] allows you to define overload sets in C, which
> > is a distinctly not-C thing to do because of the name mangling
> > involved. However, that attribute introduces the C++ semantics into C
> > whereas the BPF dominating declaration attribute is introducing wholly
> > novel semantics. So I don't really consider [[clang::overloadable]] as
> > direct precedent for this.
> >
> > >
> > > A specific need to reside within the Clang tree
> > > ===============================================
> > >
> > > The proposed attribute will be processed by Clang frontend lex and
> > > sema and it would be
> > > best to reside within the Clang tree.
> >
> > Would it be plausible/appropriate to instead run the source code
> > through a processing tool which emits modified source code with the
> > correct definitions instead of hoping this dominating declaration
> > works out? In this case, I think the user will get better diagnostic
>
> Theoretically it is possible. We need to have a preprocessor, parse
> the program, including all
> include files, do exactly the https://reviews.llvm.org/D111307 has
> done to ignore
> those duplicated relocatable types and generate a .i file and feed into clang.
> But this duplicates a lot of current clang code and the tool itself cannot
> automatically benefit from future clang improvements. So I think in-tree
> support is the best option, least maintenance burden.

I think Clang's architecture as a series of libraries provides
extensive support for building your own tooling to perform these kinds
of code transformations. For example, clang-tools-extra has
clang-change-namespace, clang-include-fixer, clang-reorder-fields, etc
and they all make use of Clang as a library without needing to modify
Clang itself. I think it would be reasonable to explore the idea of
adding such a tool to perform the rewriting for you (it could
potentially even live in-tree) as you would continue to use the
existing Clang code and still benefit from future Clang improvements.

> > behavior in the places where there *is* type confusion the compiler
> > can detect, but the user will at least have an easier time *debugging*
> > any problems from this.
> >
> > Which brings up an interesting question -- how does this attribute
> > interact with debug information?
>
> I tried a simple example. The debuginfo does not contain *ignored* types.
> And this is exactly what we want.
>
> [yhs at devbig309.ftw3 ~/work/tests/llvm/dom] cat t.h
> #pragma GCC system_header
>
> #pragma clang attribute push(__attribute__((bpf_dominating_decl)), \
>                              apply_to = any(record, enum, type_alias))
> struct bob {
>   int a;
>   union {
>     int i;
>   };
> };
> #pragma clang attribute pop
> struct bob {
>   int a;
> };
> $ cat t.c
> #include "t.h"
>
> int foo(struct bob *arg) {
>   return arg->a;
> }
> $ clang -target bpf -O2 -g -c t.c
> $ llvm-dwarfdump t.o
> t.o:    file format elf64-bpf
>
> .debug_info contents:
> 0x00000000: Compile Unit: length = 0x00000088, format = DWARF32,
> version = 0x0004, abbr_offset = 0x0000, addr_size = 0x08 (next unit at
> 0x0000008c)
>
> 0x0000000b: DW_TAG_compile_unit
>               DW_AT_producer    ("clang version 14.0.0
> (https://github.com/llvm/llvm-project.git
> 81703ebc2a7673de5b6cd71f8a6526beb1aee4ae)")
>               DW_AT_language    (DW_LANG_C99)
>               DW_AT_name        ("t.c")
>               DW_AT_stmt_list   (0x00000000)
>               DW_AT_comp_dir    ("/home/yhs/work/tests/llvm/dom")
>               DW_AT_low_pc      (0x0000000000000000)
>               DW_AT_high_pc     (0x0000000000000010)
>
> 0x0000002a:   DW_TAG_subprogram
>                 DW_AT_low_pc    (0x0000000000000000)
>                 DW_AT_high_pc   (0x0000000000000010)
>                 DW_AT_frame_base        (DW_OP_reg10 W10)
>                 DW_AT_GNU_all_call_sites        (true)
>                 DW_AT_name      ("foo")
>                 DW_AT_decl_file ("/home/yhs/work/tests/llvm/dom/t.c")
>                 DW_AT_decl_line (3)
>                 DW_AT_prototyped        (true)
>                 DW_AT_type      (0x00000051 "int")
>                 DW_AT_external  (true)
>
> 0x00000043:     DW_TAG_formal_parameter
>                   DW_AT_location        (DW_OP_reg1 W1)
>                   DW_AT_name    ("arg")
>                   DW_AT_decl_file       ("/home/yhs/work/tests/llvm/dom/t.c")
>                   DW_AT_decl_line       (3)
>                   DW_AT_type    (0x00000058 "bob *")
>
> 0x00000050:     NULL
>
> 0x00000051:   DW_TAG_base_type
>                 DW_AT_name      ("int")
>                 DW_AT_encoding  (DW_ATE_signed)
>                 DW_AT_byte_size (0x04)
>
> 0x00000058:   DW_TAG_pointer_type
>                 DW_AT_type      (0x0000005d "bob")
>
> 0x0000005d:   DW_TAG_structure_type
>                 DW_AT_name      ("bob")
>                 DW_AT_byte_size (0x08)
>                 DW_AT_decl_file ("/home/yhs/work/tests/llvm/dom/./t.h")
>                 DW_AT_decl_line (5)
>
> 0x00000065:     DW_TAG_member
>                   DW_AT_name    ("a")
>                   DW_AT_type    (0x00000051 "int")
>                   DW_AT_decl_file       ("/home/yhs/work/tests/llvm/dom/./t.h")
>                   DW_AT_decl_line       (6)
>                   DW_AT_data_member_location    (0x00)
>
> 0x00000071:     DW_TAG_member
>                   DW_AT_type    (0x00000079 "bob::union ")
>                   DW_AT_decl_file       ("/home/yhs/work/tests/llvm/dom/./t.h")
>                   DW_AT_decl_line       (7)
>                   DW_AT_data_member_location    (0x04)
> 0x00000079:     DW_TAG_union_type
>                   DW_AT_byte_size       (0x04)
>                   DW_AT_decl_file       ("/home/yhs/work/tests/llvm/dom/./t.h")
>                   DW_AT_decl_line       (7)
>
> 0x0000007d:       DW_TAG_member
>                     DW_AT_name  ("i")
>                     DW_AT_type  (0x00000051 "int")
>                     DW_AT_decl_file     ("/home/yhs/work/tests/llvm/dom/./t.h")
>                     DW_AT_decl_line     (8)
>                     DW_AT_data_member_location  (0x00)
>
> 0x00000089:       NULL
>
> 0x0000008a:     NULL
>
> 0x0000008b:   NULL

Fantastic, thank you for verifying!

~Aaron

>
> >
> > ~Aaron
> >
> > >
> > > A specification
> > > ===============
> > >
> > > The bpf_dominating_decl attribute, as its name suggested, is a bpf target
> > > specific attribute. The attribute is also for C language only.
> > >
> > > The syntax:
> > >    __attribute__((bpf_dominating_decl))
> > > The attribute can be attached to:
> > >    record, typedef and enum type declarations
> > >
> > > If a type declaration is annotated with bpf_dominating_decl attribute,
> > > that type declaration is considered as a dominating decl and later
> > > any conflict declaration, compatible or not-compatible,
> > > will be silently ignored.
> > >
> > > The following is a simple example,
> > >
> > >   #pragma clang attribute push(__attribute__((bpf_dominating_decl)), \
> > >                                apply_to = any(record, enum, type_alias))
> > >   typedef unsigned bob;
> > >   #pragma clang attribute pop
> > >   typedef int bob;
> > >
> > > In the above example, the second typedef will not incur a compilation
> > > error. The same for record types or enum values.
> > >
> > > Representation within the appropriate governing organization
> > > ============================================================
> > >
> > > N/A
> > >
> > > A long-term support plan
> > > ========================
> > >
> > > The feature will be supported for ever.
> > >
> > > A high-quality implementation
> > > =============================
> > >
> > > The feature has been implemented in [1] and has been code reviewed.
> > > One thing not in [1] but in [1]'s comment is to link
> > > bpf_dominating_decl attribute to CO-RE preserve_access_index attribute
> > > in clang frontend. That is, a type is considered dominating only if
> > > the type has both bpf_dominating_decl and preserve_access_index attributes.
> > > We can amend the implementation with this if agreed upon.
> > >
> > > A test suite
> > > ============
> > >
> > > The [1] has implemented necessary tests for the feature.
> > >
> > > References
> > > ==========
> > >   [1] https://reviews.llvm.org/D111307
> > >   [2] https://lists.llvm.org/pipermail/cfe-dev/2021-August/068696.html
> > >   [3] https://www.grant.pizza/blog/vmlinux-header/
> > >   [4] https://blog.aquasec.com/vmlinux.h-ebpf-programs
> > >   [5] https://lore.kernel.org/bpf/CAEf4BzZ1K9uQ-K1Q2BCSBesR3RUj_NW8uHu6NduoX7uLBdfukQ@mail.gmail.com/


More information about the cfe-dev mailing list