[cfe-dev] BPF: adding new clang extension bpf_dominating_decl attribute

Thu Jan 27 12:33:37 PST 2022

On Thu, Jan 6, 2022 at 4:47 AM Aaron Ballman <aaron at aaronballman.com> wrote:
>
> On Wed, Jan 5, 2022 at 3:31 PM Y Song <ys114321 at gmail.com> wrote:
> >
> > On Mon, Jan 3, 2022 at 12:52 PM Aaron Ballman <aaron at aaronballman.com> wrote:
> > >
> > > On Mon, Dec 20, 2021 at 7:06 PM Y Song <ys114321 at gmail.com> wrote:
> > > >
> > > > This is a request to add a clang extention, more specificly,
> > > > a clang attribute named bpf_dominating_decl. This clang
> > > > extention is intended to be used for bpf target only. Below
> > > > I will explain in detail about this proposed attribute, why
> > > > bpf community needs this, how it will be used and other
> > > > aspects as described in https://clang.llvm.org/get_involved.html.
> > >
> > > Thank you for this RFC!
> >
> > You are welcome!
> >
> > >
> > > > Evidence of a significant user community
> > > > ========================================
> > > >
> > > > We are proposing a new clang attribute bpf_dominating_decl which
> > > > was implemented in [1]. The feature has also been discussed in
> > > > cfe-dev mailing list ([2]). It intended to solve the
> > > > following use case:
> > > >   - A tool generated vmlinux.h is used for CO-RE (compile once,
> > > >     run everywhere) use cases.
> > > >   - vmlinux.h contains all kernel data structures for a particular config,
> > > >     see [3] and [4] about how it is generated and why it is important.
> > > >   - but vmlinux.h may have type conflicts with other headers
> > > >     user intends to use.
> > > >
> > > > Macros are such an example. Currently CO-RE relocation cannot
> > > > handle macros and macros may be defined in some header files accessible
> > > > to the user. If those header files have type conflict with vmlinux.h,
> > > > users are forced to copy macro definitions. The same for some simple
> > > > static inline functions defined in header files. This issue has been
> > > > discussed before and that is why we proposed this issue. And just last
> > > > week, it is discussed/complained again ([5]) for not able to use
> > > > some non-kernel types with a header file which has some type conflicts
> > > > with vmlinux.h.
> > > >
> > > > If it is accepted, the attribute will be used inside the vmlinux.h and
> > > > it will be used by virtually all bpf developers and it will make bpf devlopers
> > > > more productive by not copying macros, static inline functions or
> > > > non-kernel types.
> > >
> > > I'm uncomfortable with this attribute. Typically, attributes extend
> > > rather than redefine the language. e.g., you might add attributes for
> > > better performance or diagnostic characteristics, but you typically
> > > should not use an attribute to redefine the basic premises of the
> > > language.
> > >
> > > In this particular case, the attribute is used to tell the compiler to
> > > ignore type redefinition errors and instead pick a "dominating"
> > > declaration for the type. While C isn't as type sensitive as C++ is,
> > > it still has _Generic, __typeof__, and other tricks that can expose
> > > type system shenanigans like this in surprising ways. Given that type
> > > size information is critical for many things in C (memcpy, memcmp,
> > > pointer arithmetic with offsetof, etc), I'm uncomfortable with the
> > > security aspects of the likely type confusion stemming from this being
> > > so novel in C.
> >
> > To limit the potential impact. As RFC suggested, we can limit the
> > impact only for CO-RE relocatable types. bpf developers are already
> > aware and know how to use properly builtin's for CO-RE relocatable
> > types and the types we are targeting are also CO-RE relocatable types.
>
> Limiting this to just the target and just for specific types will
> certainly help, but doesn't really eliminate the fact that this
> attribute is definitely not very C-like in what it does. As mentioned
> on the code review, we have to do some interesting work to ensure we
> emit the correct diagnostics for conformance to C (or, alternatively,
> document that this target is not a C target, but that leads right back
> to my argument that this is making a new language rather than
> extending an existing one).
>
> > For example, type size, in
> > https://github.com/torvalds/linux/blob/master/tools/lib/bpf/bpf_core_read.h
> > we have the following macro:
> >
> > #define bpf_core_type_size(type)                                            \
> >         __builtin_preserve_type_info(*(typeof(type) *)0, BPF_TYPE_SIZE)
> >
> > So users can get the type size for a particular kernel. Note that the type
> > might have different sizes for different kernels.
> >
> > For offsetof issue, the bpf_core_read.h provides the following macro:
> >
> > #define BPF_CORE_READ(src, a, ...) ({                                       \
> >         ___type((src), a, ##__VA_ARGS__) __r;                               \
> >         BPF_CORE_READ_INTO(&__r, (src), a, ##__VA_ARGS__);                  \
> >         __r;                                                                \
> > })
> >
> > which eventually uses the builtin __builtin_preserve_access_index()
> > so bpfloader can adjust the offsetof properly.
> >
> > So for relocatable types, user won't use typeof or offsetof.
>
> Will their use be diagnosed for BPF targets?

Sorry for replying late. We had some discussions internally about what
is the best way to move forward.

For the question whether the use of typeof or offsetof is diagnosed
for BPF targets,
yes, we do some diagnose at bpf loader file. The compilation will be successful
with *provided* types, but during bpf loading, the bpf loader will
actually check
the host vmlinux BTF type. If the type or the field does not exist in
host vmlinux
BTF, the bpf loader will issue an error. Otherwise, bpf loader will
adjust properly
based on host vmlinux BTF.

>
> > Otherwise, programs won't be portable even without bpf_dominating_decl
> > attribute.
> >
> > >
> > > That said, we do have *one* attribute that I consider to be a
> > > "redefine the language in fundamental ways" feature --
> > > [[clang::overloadable]] allows you to define overload sets in C, which
> > > is a distinctly not-C thing to do because of the name mangling
> > > involved. However, that attribute introduces the C++ semantics into C
> > > whereas the BPF dominating declaration attribute is introducing wholly
> > > novel semantics. So I don't really consider [[clang::overloadable]] as
> > > direct precedent for this.
> > >
> > > >
> > > > A specific need to reside within the Clang tree
> > > > ===============================================
> > > >
> > > > The proposed attribute will be processed by Clang frontend lex and
> > > > sema and it would be
> > > > best to reside within the Clang tree.
> > >
> > > Would it be plausible/appropriate to instead run the source code
> > > through a processing tool which emits modified source code with the
> > > correct definitions instead of hoping this dominating declaration
> > > works out? In this case, I think the user will get better diagnostic
> >
> > Theoretically it is possible. We need to have a preprocessor, parse
> > the program, including all
> > include files, do exactly the https://reviews.llvm.org/D111307 has
> > done to ignore
> > those duplicated relocatable types and generate a .i file and feed into clang.
> > But this duplicates a lot of current clang code and the tool itself cannot
> > automatically benefit from future clang improvements. So I think in-tree
> > support is the best option, least maintenance burden.
>
> I think Clang's architecture as a series of libraries provides
> extensive support for building your own tooling to perform these kinds
> of code transformations. For example, clang-tools-extra has
> clang-change-namespace, clang-include-fixer, clang-reorder-fields, etc
> and they all make use of Clang as a library without needing to modify
> Clang itself. I think it would be reasonable to explore the idea of
> adding such a tool to perform the rewriting for you (it could
> potentially even live in-tree) as you would continue to use the
> existing Clang code and still benefit from future Clang improvements.

We internally discussed the extra tool approach. We intend not  go to
this route right now as this yet another tool adds complexity to build process
and probably will discourage people from using it. David Rector has
another idea about detecting and removing duplicated types during parsing
stage (before semantic analysis). We would like to explore that first.

Thanks!

Yonghong

[...]