[cfe-dev] BPF: adding new clang extension bpf_dominating_decl attribute

David Rector via cfe-dev cfe-dev at lists.llvm.org
Thu Jan 6 06:02:01 PST 2022



> On Jan 6, 2022, at 7:47 AM, Aaron Ballman via cfe-dev <cfe-dev at lists.llvm.org> wrote:
> 
> On Wed, Jan 5, 2022 at 3:31 PM Y Song <ys114321 at gmail.com> wrote:
>> 
>> On Mon, Jan 3, 2022 at 12:52 PM Aaron Ballman <aaron at aaronballman.com> wrote:
>>> 
>>> On Mon, Dec 20, 2021 at 7:06 PM Y Song <ys114321 at gmail.com> wrote:
>>>> 
>>>> This is a request to add a clang extention, more specificly,
>>>> a clang attribute named bpf_dominating_decl. This clang
>>>> extention is intended to be used for bpf target only. Below
>>>> I will explain in detail about this proposed attribute, why
>>>> bpf community needs this, how it will be used and other
>>>> aspects as described in https://clang.llvm.org/get_involved.html.
>>> 
>>> Thank you for this RFC!
>> 
>> You are welcome!
>> 
>>> 
>>>> Evidence of a significant user community
>>>> ========================================
>>>> 
>>>> We are proposing a new clang attribute bpf_dominating_decl which
>>>> was implemented in [1]. The feature has also been discussed in
>>>> cfe-dev mailing list ([2]). It intended to solve the
>>>> following use case:
>>>>  - A tool generated vmlinux.h is used for CO-RE (compile once,
>>>>    run everywhere) use cases.
>>>>  - vmlinux.h contains all kernel data structures for a particular config,
>>>>    see [3] and [4] about how it is generated and why it is important.
>>>>  - but vmlinux.h may have type conflicts with other headers
>>>>    user intends to use.
>>>> 
>>>> Macros are such an example. Currently CO-RE relocation cannot
>>>> handle macros and macros may be defined in some header files accessible
>>>> to the user. If those header files have type conflict with vmlinux.h,
>>>> users are forced to copy macro definitions. The same for some simple
>>>> static inline functions defined in header files. This issue has been
>>>> discussed before and that is why we proposed this issue. And just last
>>>> week, it is discussed/complained again ([5]) for not able to use
>>>> some non-kernel types with a header file which has some type conflicts
>>>> with vmlinux.h.
>>>> 
>>>> If it is accepted, the attribute will be used inside the vmlinux.h and
>>>> it will be used by virtually all bpf developers and it will make bpf devlopers
>>>> more productive by not copying macros, static inline functions or
>>>> non-kernel types.
>>> 
>>> I'm uncomfortable with this attribute. Typically, attributes extend
>>> rather than redefine the language. e.g., you might add attributes for
>>> better performance or diagnostic characteristics, but you typically
>>> should not use an attribute to redefine the basic premises of the
>>> language.
>>> 
>>> In this particular case, the attribute is used to tell the compiler to
>>> ignore type redefinition errors and instead pick a "dominating"
>>> declaration for the type. While C isn't as type sensitive as C++ is,
>>> it still has _Generic, __typeof__, and other tricks that can expose
>>> type system shenanigans like this in surprising ways. Given that type
>>> size information is critical for many things in C (memcpy, memcmp,
>>> pointer arithmetic with offsetof, etc), I'm uncomfortable with the
>>> security aspects of the likely type confusion stemming from this being
>>> so novel in C.
>> 
>> To limit the potential impact. As RFC suggested, we can limit the
>> impact only for CO-RE relocatable types. bpf developers are already
>> aware and know how to use properly builtin's for CO-RE relocatable
>> types and the types we are targeting are also CO-RE relocatable types.
> 
> Limiting this to just the target and just for specific types will
> certainly help, but doesn't really eliminate the fact that this
> attribute is definitely not very C-like in what it does. As mentioned
> on the code review, we have to do some interesting work to ensure we
> emit the correct diagnostics for conformance to C (or, alternatively,
> document that this target is not a C target, but that leads right back
> to my argument that this is making a new language rather than
> extending an existing one).
> 
>> For example, type size, in
>> https://github.com/torvalds/linux/blob/master/tools/lib/bpf/bpf_core_read.h
>> we have the following macro:
>> 
>> #define bpf_core_type_size(type)                                            \
>>        __builtin_preserve_type_info(*(typeof(type) *)0, BPF_TYPE_SIZE)
>> 
>> So users can get the type size for a particular kernel. Note that the type
>> might have different sizes for different kernels.
>> 
>> For offsetof issue, the bpf_core_read.h provides the following macro:
>> 
>> #define BPF_CORE_READ(src, a, ...) ({                                       \
>>        ___type((src), a, ##__VA_ARGS__) __r;                               \
>>        BPF_CORE_READ_INTO(&__r, (src), a, ##__VA_ARGS__);                  \
>>        __r;                                                                \
>> })
>> 
>> which eventually uses the builtin __builtin_preserve_access_index()
>> so bpfloader can adjust the offsetof properly.
>> 
>> So for relocatable types, user won't use typeof or offsetof.
> 
> Will their use be diagnosed for BPF targets?
> 
>> Otherwise, programs won't be portable even without bpf_dominating_decl
>> attribute.
>> 
>>> 
>>> That said, we do have *one* attribute that I consider to be a
>>> "redefine the language in fundamental ways" feature --
>>> [[clang::overloadable]] allows you to define overload sets in C, which
>>> is a distinctly not-C thing to do because of the name mangling
>>> involved. However, that attribute introduces the C++ semantics into C
>>> whereas the BPF dominating declaration attribute is introducing wholly
>>> novel semantics. So I don't really consider [[clang::overloadable]] as
>>> direct precedent for this.
>>> 
>>>> 
>>>> A specific need to reside within the Clang tree
>>>> ===============================================
>>>> 
>>>> The proposed attribute will be processed by Clang frontend lex and
>>>> sema and it would be
>>>> best to reside within the Clang tree.
>>> 
>>> Would it be plausible/appropriate to instead run the source code
>>> through a processing tool which emits modified source code with the
>>> correct definitions instead of hoping this dominating declaration
>>> works out? In this case, I think the user will get better diagnostic
>> 
>> Theoretically it is possible. We need to have a preprocessor, parse
>> the program, including all
>> include files, do exactly the https://reviews.llvm.org/D111307 has
>> done to ignore
>> those duplicated relocatable types and generate a .i file and feed into clang.
>> But this duplicates a lot of current clang code and the tool itself cannot
>> automatically benefit from future clang improvements. So I think in-tree
>> support is the best option, least maintenance burden.
> 
> I think Clang's architecture as a series of libraries provides
> extensive support for building your own tooling to perform these kinds
> of code transformations. For example, clang-tools-extra has
> clang-change-namespace, clang-include-fixer, clang-reorder-fields, etc
> and they all make use of Clang as a library without needing to modify
> Clang itself. I think it would be reasonable to explore the idea of
> adding such a tool to perform the rewriting for you (it could
> potentially even live in-tree) as you would continue to use the
> existing Clang code and still benefit from future Clang improvements.

The general problem, that it is laborious to solve the problem in a tool, and furthermore such a solution would require duplicating lots of existing clang functionality creating a maintenance burden, and that it is thus preferable to upstream the solution, which burdens other users to some extent, recalls the discussion about upstreaming Swift’s APINotes:
https://lists.llvm.org/pipermail/cfe-dev/2020-October/066944.html <https://lists.llvm.org/pipermail/cfe-dev/2020-October/066944.html>

Then and now, I think the best solution in these situations is to encourage/support patches which introduce new handles in ASTConsumer to customize behavior during parsing (rather than always afterward, via HandleTranslationUnit).   This would keep consumer details from leaking upstream, and upstream details from leaking into downstream consumers, lessening maintenance burdens for everyone.

In this case: suppose that wherever clang emits an error, that were changed to a call to an ASTConsumer virtual method which by default emits the error.

I think the tool would then be fairly trivial: instead of issuing a type redefinition error, your tool simply deletes or omit the declaration that caused it.

Would this solve the problem, or am I missing something?

> 
>>> behavior in the places where there *is* type confusion the compiler
>>> can detect, but the user will at least have an easier time *debugging*
>>> any problems from this.
>>> 
>>> Which brings up an interesting question -- how does this attribute
>>> interact with debug information?
>> 
>> I tried a simple example. The debuginfo does not contain *ignored* types.
>> And this is exactly what we want.
>> 
>> [yhs at devbig309.ftw3 ~/work/tests/llvm/dom] cat t.h
>> #pragma GCC system_header
>> 
>> #pragma clang attribute push(__attribute__((bpf_dominating_decl)), \
>>                             apply_to = any(record, enum, type_alias))
>> struct bob {
>>  int a;
>>  union {
>>    int i;
>>  };
>> };
>> #pragma clang attribute pop
>> struct bob {
>>  int a;
>> };
>> $ cat t.c
>> #include "t.h"
>> 
>> int foo(struct bob *arg) {
>>  return arg->a;
>> }
>> $ clang -target bpf -O2 -g -c t.c
>> $ llvm-dwarfdump t.o
>> t.o:    file format elf64-bpf
>> 
>> .debug_info contents:
>> 0x00000000: Compile Unit: length = 0x00000088, format = DWARF32,
>> version = 0x0004, abbr_offset = 0x0000, addr_size = 0x08 (next unit at
>> 0x0000008c)
>> 
>> 0x0000000b: DW_TAG_compile_unit
>>              DW_AT_producer    ("clang version 14.0.0
>> (https://github.com/llvm/llvm-project.git
>> 81703ebc2a7673de5b6cd71f8a6526beb1aee4ae)")
>>              DW_AT_language    (DW_LANG_C99)
>>              DW_AT_name        ("t.c")
>>              DW_AT_stmt_list   (0x00000000)
>>              DW_AT_comp_dir    ("/home/yhs/work/tests/llvm/dom")
>>              DW_AT_low_pc      (0x0000000000000000)
>>              DW_AT_high_pc     (0x0000000000000010)
>> 
>> 0x0000002a:   DW_TAG_subprogram
>>                DW_AT_low_pc    (0x0000000000000000)
>>                DW_AT_high_pc   (0x0000000000000010)
>>                DW_AT_frame_base        (DW_OP_reg10 W10)
>>                DW_AT_GNU_all_call_sites        (true)
>>                DW_AT_name      ("foo")
>>                DW_AT_decl_file ("/home/yhs/work/tests/llvm/dom/t.c")
>>                DW_AT_decl_line (3)
>>                DW_AT_prototyped        (true)
>>                DW_AT_type      (0x00000051 "int")
>>                DW_AT_external  (true)
>> 
>> 0x00000043:     DW_TAG_formal_parameter
>>                  DW_AT_location        (DW_OP_reg1 W1)
>>                  DW_AT_name    ("arg")
>>                  DW_AT_decl_file       ("/home/yhs/work/tests/llvm/dom/t.c")
>>                  DW_AT_decl_line       (3)
>>                  DW_AT_type    (0x00000058 "bob *")
>> 
>> 0x00000050:     NULL
>> 
>> 0x00000051:   DW_TAG_base_type
>>                DW_AT_name      ("int")
>>                DW_AT_encoding  (DW_ATE_signed)
>>                DW_AT_byte_size (0x04)
>> 
>> 0x00000058:   DW_TAG_pointer_type
>>                DW_AT_type      (0x0000005d "bob")
>> 
>> 0x0000005d:   DW_TAG_structure_type
>>                DW_AT_name      ("bob")
>>                DW_AT_byte_size (0x08)
>>                DW_AT_decl_file ("/home/yhs/work/tests/llvm/dom/./t.h")
>>                DW_AT_decl_line (5)
>> 
>> 0x00000065:     DW_TAG_member
>>                  DW_AT_name    ("a")
>>                  DW_AT_type    (0x00000051 "int")
>>                  DW_AT_decl_file       ("/home/yhs/work/tests/llvm/dom/./t.h")
>>                  DW_AT_decl_line       (6)
>>                  DW_AT_data_member_location    (0x00)
>> 
>> 0x00000071:     DW_TAG_member
>>                  DW_AT_type    (0x00000079 "bob::union ")
>>                  DW_AT_decl_file       ("/home/yhs/work/tests/llvm/dom/./t.h")
>>                  DW_AT_decl_line       (7)
>>                  DW_AT_data_member_location    (0x04)
>> 0x00000079:     DW_TAG_union_type
>>                  DW_AT_byte_size       (0x04)
>>                  DW_AT_decl_file       ("/home/yhs/work/tests/llvm/dom/./t.h")
>>                  DW_AT_decl_line       (7)
>> 
>> 0x0000007d:       DW_TAG_member
>>                    DW_AT_name  ("i")
>>                    DW_AT_type  (0x00000051 "int")
>>                    DW_AT_decl_file     ("/home/yhs/work/tests/llvm/dom/./t.h")
>>                    DW_AT_decl_line     (8)
>>                    DW_AT_data_member_location  (0x00)
>> 
>> 0x00000089:       NULL
>> 
>> 0x0000008a:     NULL
>> 
>> 0x0000008b:   NULL
> 
> Fantastic, thank you for verifying!
> 
> ~Aaron
> 
>> 
>>> 
>>> ~Aaron
>>> 
>>>> 
>>>> A specification
>>>> ===============
>>>> 
>>>> The bpf_dominating_decl attribute, as its name suggested, is a bpf target
>>>> specific attribute. The attribute is also for C language only.
>>>> 
>>>> The syntax:
>>>>   __attribute__((bpf_dominating_decl))
>>>> The attribute can be attached to:
>>>>   record, typedef and enum type declarations
>>>> 
>>>> If a type declaration is annotated with bpf_dominating_decl attribute,
>>>> that type declaration is considered as a dominating decl and later
>>>> any conflict declaration, compatible or not-compatible,
>>>> will be silently ignored.
>>>> 
>>>> The following is a simple example,
>>>> 
>>>>  #pragma clang attribute push(__attribute__((bpf_dominating_decl)), \
>>>>                               apply_to = any(record, enum, type_alias))
>>>>  typedef unsigned bob;
>>>>  #pragma clang attribute pop
>>>>  typedef int bob;
>>>> 
>>>> In the above example, the second typedef will not incur a compilation
>>>> error. The same for record types or enum values.
>>>> 
>>>> Representation within the appropriate governing organization
>>>> ============================================================
>>>> 
>>>> N/A
>>>> 
>>>> A long-term support plan
>>>> ========================
>>>> 
>>>> The feature will be supported for ever.
>>>> 
>>>> A high-quality implementation
>>>> =============================
>>>> 
>>>> The feature has been implemented in [1] and has been code reviewed.
>>>> One thing not in [1] but in [1]'s comment is to link
>>>> bpf_dominating_decl attribute to CO-RE preserve_access_index attribute
>>>> in clang frontend. That is, a type is considered dominating only if
>>>> the type has both bpf_dominating_decl and preserve_access_index attributes.
>>>> We can amend the implementation with this if agreed upon.
>>>> 
>>>> A test suite
>>>> ============
>>>> 
>>>> The [1] has implemented necessary tests for the feature.
>>>> 
>>>> References
>>>> ==========
>>>>  [1] https://reviews.llvm.org/D111307
>>>>  [2] https://lists.llvm.org/pipermail/cfe-dev/2021-August/068696.html
>>>>  [3] https://www.grant.pizza/blog/vmlinux-header/
>>>>  [4] https://blog.aquasec.com/vmlinux.h-ebpf-programs
>>>>  [5] https://lore.kernel.org/bpf/CAEf4BzZ1K9uQ-K1Q2BCSBesR3RUj_NW8uHu6NduoX7uLBdfukQ@mail.gmail.com/
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20220106/7bd8a9c1/attachment-0001.html>


More information about the cfe-dev mailing list