[cfe-dev] support __attribute__((weak)) for non-primitive types

Y Song via cfe-dev cfe-dev at lists.llvm.org
Sun Sep 5 19:52:34 PDT 2021


On Fri, Sep 3, 2021 at 3:25 PM Richard Smith <richard at metafoo.co.uk> wrote:
>
> On Fri, 3 Sept 2021 at 13:57, Andrii Nakryiko via cfe-dev <cfe-dev at lists.llvm.org> wrote:
>>
>> On 8/30/21 8:43 AM, Y Song wrote:
>> > On Mon, Aug 30, 2021 at 4:24 AM Aaron Ballman <aaron at aaronballman.com> wrote:
>> >> On Sun, Aug 29, 2021 at 2:17 AM Y Song <ys114321 at gmail.com> wrote:
>> >>> On Thu, Aug 26, 2021 at 4:24 AM Aaron Ballman <aaron at aaronballman.com> wrote:
>> >>>> On Wed, Aug 25, 2021 at 1:59 PM Y Song <ys114321 at gmail.com> wrote:
>> >>>>> On Wed, Aug 25, 2021 at 5:34 AM Aaron Ballman <aaron at aaronballman.com> wrote:
>> >>>>>> On Tue, Aug 24, 2021 at 1:18 AM Y Song <ys114321 at gmail.com> wrote:
>> >>>>>>> Hi,
>> >>>>>>>
>> >>>>>>> This is a BPF use case. We would like to check whether
>> >>>>>>> __attribute__((weak)) support for non-primitive types would be
>> >>>>>>> possible in clang or not. For example, if we have two types like
>> >>>>>>>     struct t { int a; } __attribute__((weak));
>> >>>>>>>     struct t { int a; };
>> >>>>>>>
>> >>>>>>>     typedef unsigned __u32;
>> >>>>>>>     typedef unsigned __u32 __attribute__((weak));
>> >>>>>>> the compilation should not fail. It should just ignore weak definitions.
>> >>>>>>>
>> >>>>>>> In BPF, to support CO-RE (compile once and run everywhere), we
>> >>>>>>> generate a *generic* vmlinux.h
>> >>>>>>> (https://facebookmicrosites.github.io/bpf/blog/2020/02/19/bpf-portability-and-co-re.html )
>> >>>>>>> which is included in the bpf program. The bpf loader (libbpf) will
>> >>>>>>> adjust field offsets properly based on host record layout so the same
>> >>>>>>> program can run on different kernels which may have different record
>> >>>>>>> layouts.
>> >>>>>>>
>> >>>>>>> But vmlinux.h does not support macros (and simple static inline helper
>> >>>>>>> functions in kernel header files). Currently, users need to define
>> >>>>>>> these macros and static inline functions explicitly in their bpf
>> >>>>>>> program. What we are thinking is to permit users to include kernel
>> >>>>>>> header files containing these macros/static inline functions. But this
>> >>>>>>> may introduce duplicated types as these types have been defined in
>> >>>>>>> vmlinux.h.
>> >>>>>>>
>> >>>>>>> For example, we may have:
>> >>>>>>>      socket.h:
>> >>>>>>>          struct socket { ... };
>> >>>>>>>          #define TCP    6
>> >>>>>>>      vmlinux.h:
>> >>>>>>>          struct socket { ... };
>> >>>>>>>      bpf.c:
>> >>>>>>>          #include <socket.h>
>> >>>>>>>          #include <vmlinux.h>
>> >>>>>>>          ... TCP ...
>> >>>>>>>
>> >>>>>>> In this case, struct "socket" is defined twice. If we can have something like
>> >>>>>>>      bpf.c:
>> >>>>>>>        #pragma clang attribute push (__attribute__((weak)), apply_to = record)
>> >>>>>>>        #include <socket.h>
>> >>>>>>>        #pragma clang attribute pop
>> >>>>>>>        #include <vmlinux.h>
>> >>>>>>>          ... TCP ...
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> Then bpf program can get header file macro definitions without copying
>> >>>>>>> explicitly.
>> >>>>>>>
>> >>>>>>> We need support for "apply_to = typedef" in the above pragma, so we
>> >>>>>>> can deal with typedef types as well.
>> >>>>>>>
>> >>>>>>> Another issue is related to what if two types are not compatible. For example,
>> >>>>>>>      struct t { int a; } __attribute__((weak));
>> >>>>>>>      struct t { int a; int b; };
>> >>>>>>> I think we could have a flag to control whether we allow incompatible
>> >>>>>>> weak type or not.
>> >>>>>> What's the use case for allowing incompatible weak types? Having
>> >>>>> In our use case, we are talking about incompatible types between
>> >>>>>     . vmlinux.h, generated with a particular kernel config which tries to
>> >>>>>       contains all fields so it can be used for different kernel builds
>> >>>>>     . kernel header files on a particular kernel build which may have
>> >>>>>       less fields than vmlinux.
>> >>>>>
>> >>>>> The below is a concrete example.
>> >>>>> In linux kernel internal header files, many struct definition depends on
>> >>>>> build time configuration. For example, struct "task_struct" in
>> >>>>> linux:include/linux/sched.h,
>> >>>>> https://github.com/torvalds/linux/blob/master/include/linux/sched.h#L661-L1418
>> >>>>> For example, whether field "numa_scan_seq" exists in the struct or not
>> >>>>> depends on
>> >>>>> config option CONFIG_NUMA_BALANCING.
>> >>>>>
>> >>>>> It is totally possible vmlinux.h "task_struct" structure has field
>> >>>>> "numa_scan_seq"
>> >>>>> while the host include/linux/sched.h "task_struct" structure does not have
>> >>>>> "numa_scan_seq" field.
>> >>>> Ah, thank you for the explanation!
>> >>> After a little more thought, I think we will have vmlinux.h ahead of
>> >>> socket.h to make compiler easy to resolve weak types.
>> >>>
>> >>> bpf.c:
>> >>>       #include <vmlinux.h>
>> >>>       #pragma clang attribute push (__attribute__((weak)), apply_to = any)
>> >>>       #include <socket.h>
>> >>>       #pragma clang attribute pop
>> >>>         ... TCP ...
>> >>>
>> >>> This way,  any type in socket.h, if it is also in vmlinux.h, will be quickly
>> >>> discarded.
>> >> So the non-weak types are expected to be seen first, so you'll pick
>> >> the non-weak type every time unless the type only exists in weak form?
>> >> That seems reasonable.
>>
>> I'm sorry I'm a bit late to discussion, been out on vacation. But I'm
>> super interested in this feature and the one who added vmlinux.h to BPF
>> land, so wanted to provide some feedback as well.
>>
>>
>> Unless it complicates the implementation extremely, I hope we don't have
>> to specify that non-weak type had to be first (or last, or any defined
>> order). It would add unnecessary complexity and surprise for users.
>
>
> I think we will very likely need to require that a non-weak definition of the type is seen before any weak definition. Otherwise we risk the possibility that the weak definition of the type is used in some way before we see the "real" definition, which will lead to serious problems. For example, the compiler could compute the struct's layout based on the first (weak) definition, then later find that it should discard that definition, but it might be too late: we might have already made hard-to-undo decisions based on that information. Discarding a weak definition when we've already seen a non-weak one doesn't have this kind of problem.
>
> If that makes the use of a compiler-based approach untenable for your use case, you should look hard for other alternatives. Asking us to support replacing the definition of a type after the fact would be imposing a very high cost on us.

Thanks for clarification. Yes, we are aware of this potential issue
during discussion with Aaron. We are working on a POC now and
certainly we don't want to make implementation complicated. What you
mentioned totally makes sense and we will keep this in mind. Thanks!

>
>> Think about weak symbols as an analogy. If linker demanded that non-weak
>> symbol has to be encountered before weak symbol, it would complicate
>> things for end users immensely.
>>
>>
>> Regarding the example that Yonghong provided. I was hoping that in
>> practice it will be the other way: vmlinux.h internally has
>>
>> #pragma clang attribute push (__attribute__((weak)), apply_to = any)
>> /* vmlinux.h types here */
>> #pragma clang attribute pop
>>
>> and then user code looks just like this:
>>
>> #include <vmlinux.h>
>> #include <socket.h>
>>
>> So socket.h types actually would override vmlinux.h definitions, if they are conflicting. One hiccup with that is that for those types in socket.h, we'd like to "inherit" __attribute__((preserve_access_index), to make them BPF CO-RE-relocatable. Continuing analogy with weak symbols, it's similar to inheriting hidden symbol visibility, which can be defined on weak or extern declaration, but is applied to a final resolved symbol.
>>
>> Would that be possible to support?
>>
>>
>> >>
>> >>>>>> multiple distinct types seems like it will make for confusion. e.g.,
>> >>>>>>
>> >>>>>> struct foo { int a; };
>> >>>>>> struct __attribute__((whatever_we_name_it)) foo { float f; };
>> >>>>>>
>> >>>>>> int func(struct foo *f) {
>> >>>>>>    return f->f + f->a;
>> >>>>>> }
>> >>>>>>
>> >>>>>> Which lookup causes the error (or do they succeed somehow)?
>> >>>>> In our use case, we intend to ensure struct field access *always*
>> >>>>> coming from final structure.
>> >>>> Okay, so "last definition seen wins"?
>> >>>>
>> >>>> I think it's a bit odd that the API designer can write a structure
>> >>>> definition and assume that it's the one the user will use, and then a
>> >>>> later definition can come along and silently override it. Say I have
>> >>>> the following:
>> >>>>
>> >>>> // AwesomeAPI.h
>> >>>> struct Cool {
>> >>>>    int a;
>> >>>>    float f;
>> >>>> };
>> >>>>
>> >>>> void DoAwesomeStuff(struct Cool *C);
>> >>>>
>> >>>> // AwesomeAPI.c
>> >>>> #include "AwesomeAPI.h"
>> >>>> #include "string.h"
>> >>>>
>> >>>> void DoAwesomeStuff(struct Cool *C) {
>> >>>>    C->a = 12;
>> >>>>    C->f = 1.2;
>> >>>> }
>> >>>>
>> >>>> // LessAwesomeAPI.c
>> >>>> #include "AwesomeAPI.h"
>> >>>>
>> >>>> struct __attribute__((whatever_we_name_this)) Cool {
>> >>>>    int a;
>> >>>> };
>> >>>>
>> >>>> int main(void) {
>> >>>>    struct Cool C;
>> >>>>    DoAwesomeStuff(&C);
>> >>>> }
>> >>>>
>> >>>> Assume these TUs are compiled separately and then linked together
>> >>>> later. Is there a buffer overflow in DoAwesomeStuff() because the Cool
>> >>>> object from LessAwesomeAPI.c is only sizeof(int) while the Cool object
>> >>>> expected within AwesomeAPI.c is sizeof(int) + sizeof(float)?
>> >>> Thanks for the detailed explanation in the above. This indeed could make
>> >>> it really hard to use for applications.
>> >>>
>> >>> For the weaktype attribute, I am not sure whether there is a use case or
>> >>> not outside the bpf community. How about we design the attribute to
>> >>> be bpf specific. We can name the attribute to be "bpf_weaktype".
>> >> That could help make the expected usage domain more clear, but we'll
>> >> still need to consider accidentally incorrect usage to help the bpf
>> >> community. In the above example, I think it's fine to say "if this
>> >> happens, it's UB" and we should probably document it as such.
>>
>> I might be missing some details here, but I see two issues in the
>> example provided above.
>>
>>
>> First, "unstable" sizeof(). In BPF CO-RE world, using plain sizeof() is
>> an anti-pattern, unless you know that sizeof() value won't change (some
>> fixed UAPI struct) or it can only ever increase and we don't really need
>> to have entire struct (again, this will mostly be UAPI types). If you
>> have no such guarantee and need to know exact actual sizeof(), there is
>> a CO-RE relocation that compiler can emit, when user uses proper
>> built-in (we provide a convenient macro for that with libbpf,
>> bpf_core_type_size()).
>>
>>
>> Second problem is that if some part of source code assumed C->f existed,
>> but actual resolved type doesn't have field f. If that's the case within
>> single source file, then it's clearly a compilation error, right? If
>> that happens in two different CUs, then it's not a problem, because each
>> CU gets its own view of entire set of types, which will be recorded in
>> corresponding DWARF and BTF type information.
>>
>>
>> I'm not sure how Clang would deal with that when linking multiple CUs
>> through LTO built (sorry, I'm probably misusing the terminology). But if
>> we use BPF static linking API through libbpf, it all will work out ok, I
>> think, because we'll just merge these two incompatible BTF type
>> definitions for struct Cool, and at runtime, if actual struct Cool has
>> field f -- then all good. If not, it will be an error during
>> loading/processing BPF program by libbpf.
>>
>>
>> >>
>> >>> We can change rules to be ""last definition seen wins"".
>> >>>    - if we only see bpf_weaktype types, last seen definition wins.
>> >>>    - as soon as we see one non bpf_weaktype type, that one wins.
>> >>>      Any later weaktype type will be ignored and and any later non-weaktype
>> >>>      type will cause a redefine compilation error.
>> >> I think that makes sense, though I think it could also make sense to
>> >> say "If we only see bpf_weaktype types, the first definition wins and
>> >> any later weaktype definition for the same type will be diagnosed as
>> >> an error. Effectively, you'll see 0, 1, or 2 definitions, but never
>> >> more than two.
>> > What you suggested in above makes sense for us. In bpf use case,
>> > we never see more than 2 definitions. It will be either
>> >    (1). non weaktype, or
>> >    (2). non weaktype followed by a weaktype, or
>> >    (3). weaktype
>> > I think any other cases w.r.t. types can be considered as an error.
>>
>> Sorry, again, I have to disagree. That's adding too many assumptions.
>>
>> One thing we didn't mention yet is kernel module BTFs. Right now we
>> don't have the same vmlinux.h for kernel modules, but I think we will
>> very soon. In such case some types might be defined (as weak_type) in
>> both vmlinux.h and <some_kernel_module>.h. Maybe not, but unless these
>> restrictions buy us tremendous amount of simplicity internally, I ask to
>> not add unnecessary assumptions about ordering and count. We don't have
>> them with weak symbols, so I think it makes it more natural to not have
>> that for types.
>>
>>
>> >>> The following is a concrete example:
>> >>>     struct t { int a; } __bpf_weaktype;
>> >>>     struct d1 { struct t f; };  // use struct t {int a;}
>> >>>     struct t { int b; int c; } __bpf_weaktype;
>> >>>     struct d2 { struct t f; }; // use struct t {int b; int c; }
>> >> The above bit could catch users by surprise. One potential way to
>> >> implement this functionality would be to use a two-pass system where
>> >> pass one finds all the type definitions and resolves them to one type
>> >> and pass two does the actual compilation using that resolved type. In
>> >> that case, a reasonable user might expect d2 to use the `int a;`
>> >> definition. However, if we made additional weak definitions be an
>> >> error, they'd never get into this situation. (And if we have to
>> >> support multiple weaktypes, then we could perhaps warn on the
>> >> situation where a weak definition was used somewhere and then another
>> >> weak definition of the same type is encountered, if that'd make sense
>> >> to do.)
>> > Thanks for the explanation. Let us not allow multiple weak types.
>> > This is not a bpf use case. This should simplify implementation a lot.
>>
>> In BPF CO-RE world the actual size of the type and field offsets don't
>> matter. What if __bpf_weaktype implies preserve_access_index
>> automatically? Would that make it a non-issue? With BPF CO-RE and
>> relocatable type definitions the point is to record field access
>> intentions, not the actual memory layout (which will be different at
>> runtime anyways). So in BPF CO-RE world the above situation isn't all
>> that surprising.
>>
>>
>> >
>> >>>     struct t {char c; };
>> >>>     struct d3 { strut t f; };  // use struct t { char c; }
>> >>>     struct t {short s; } __bpf_weaktype;
>> >>>     struct d4 { struct t f; }; // use struct t { char c; }
>> >>>     struct t {long long l; }; // compilation error: redefinition
>> >>>
>> >>>>> So, if the final resolved type is in
>> >>>>> vmlinux.h, the field must be in vmlinux.h structure, otherwise,
>> >>>>> issue an error. The same if the final resolved type is in host
>> >>>>> kernel header file.
>> >>>>>
>> >>>>> Therefore, in the above case, f->f is an error since final
>> >>>>> resolved type is "struct foo { int a; }" and field "f" does not
>> >>>>> exist in struct foo.
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>>> It would be good if I can get some suggestions/directions on whether
>> >>>>>>> such a feature is possible or not for clang frontend.
>> >>>>>> What happens if the only definition seen in a TU is marked as being weak?
>> >>>>> That weak will be promoted to the resolved type.
>> >>>>>> In terms of incompatible weak types, are these incompatible?
>> >>>>>>
>> >>>>>> struct base {
>> >>>>>>    int a;
>> >>>>>> };
>> >>>>>>
>> >>>>>> struct foo : base {
>> >>>>>>    int b;
>> >>>>>> };
>> >>>>>>
>> >>>>>> struct __attribute__((whatever_we_name_it)) foo {
>> >>>>>>    int a, b;
>> >>>>>> }
>> >>>>> Our use case is for C only.
>> >>>> That's good to know. In C, are these incompatible?
>> >>>>
>> >>>> struct foo {
>> >>>>    int a;
>> >>>>    float f;
>> >>>> };
>> >>>>
>> >>>> struct __attribute__((whatever_we_name_this)) foo {
>> >>>>    struct {
>> >>>>      int a;
>> >>>>      float f;
>> >>>>    };
>> >>>> };
>> >>>>
>> >>>> (We could ask the same question using anonymous unions, etc.)
>> >>> Based on what I described in the above, the first "struct foo" is a
>> >>> non bpf_weaktype and
>> >>> the second is a bpf_weaktype, so the second will be ignored regardless
>> >>> of whether
>> >>> they are compatible or not.
>> >> Makes sense, thanks!
>>
>> They are also completely compatible from BPF CO-RE standpoint. If kernel
>> defines foo one way, and BPF program uses the other definition, it all
>> works correctly with BPF CO-RE.
>>
>> >>
>> >>>>>> I'm not comfortable with the idea of having two different type
>> >>>>>> definitions in the same TU unless it's clear and consistent which one
>> >>>>>> "wins". If the type definitions are the same between the types, then I
>> >>>>>> don't see a problem because that limits the chances for surprises.
>> >>>>>> When the type definitions differ, it becomes harder to reason about
>> >>>>>> the way lookup works, how the type is laid out at runtime, what the
>> >>>>>> ABI implications are, etc. Is there a consistent rule for which type
>> >>>>>> is the one to be honored (e.g., only one definition is allowed to
>> >>>>>> elide the attribute and that's the canonical definition of the type
>> >>>>>> which always "wins")?
>> >>>>> All these are very good questions. To handle incompatible types,
>> >>>>> yes, we need clear rules. Otherwise, it will cause confusion.
>> >>>>> In our use case, the rule looks like below:
>> >>>>>    - For a particular named type, if only one is defined with
>> >>>>>      "weaktype" attribute, that type becomes resolved type.
>> >>>>>    - If only one is defined without "weaktype" attribute,
>> >>>>>      that type becomes resolved type.
>> >>>>>    - If you have one "weaktype" and one non-weaktype types,
>> >>>>>      the non-weaktype becomes the resolved type.
>> >>>> Oh, this is different from my interpretation of "last one wins" above.
>> >>>> If so, you can reverse which definition the attribute is applied to
>> >>>> and have the same question about buffer overflows.
>> >>>>
>> >>>>>    - more than one weaktype type or more than one non-weaktype
>> >>>>>      type will cause a compilation error.
>> >>>> Ah, so there will be exactly 0, 1, or 2 definitions?
>> >>>>
>> >>>>>    - any field access will be based on resolve type.
>> >>>>>
>> >>>>> Does this sound reasonable? Or I could still miss some cases?
>> >>>> I apologize if I'm being dense, but I'm still trying to wrap my head
>> >>>> around how this works in practice. I'm worried about linking more than
>> >>>> one TU together where this resolution is different and getting ODR
>> >>> Indeed sizeof() may get weird result even in my above example
>> >>> as at different program point, sizeof() of the type might have
>> >>> different values. This may propagate to other types as well.
>>
>> See above. With BPF CO-RE, using straight sizeof() is discouraged. We
>> have a relocatable way to fetch actual type (or field) size.
>>
>>
>> >>>
>> >>>> violations as a result, getting weird situations where sizeof() the
>> >>>> type differs between TUs and causes issues, etc. I suppose one way to
>> >>>> dig into that is: if there was such a thing as a "type sanitizer"
>> >>>> (akin to UBSan but for types), what sort of functionality would you
>> >>>> expect it needs to help programmers detect bugs at runtime?
>> >>> This is the reason that I would like to limit this attribute to be bpf only.
>> >>> All bpf programs need to pass the kernel verifier before being able to
>> >>> run in the kernel. They may get a program rejected by the kernel or
>> >>> get an incorrect answer but not crash the kernel.
>> >> Oh, that's good to know!
>> >>
>> >>> In bpf use case, I expect users mostly use the code pattern
>> >>>       #include <vmlinux.h>
>> >>>       #pragma clang attribute push (__attribute__((bpf_weaktype)),
>> >>> apply_to = any)
>> >>>       #include <socket.h>
>> >>>       #pragma clang attribute pop
>> >>>         ... TCP ...
>> >>>         ... use types from vmlinux.h ...
>> >>>
>> >>> so weird sizeof() issue shouldn't happen.
>> >> My gut instinct is that there will be weird issues from this because
>> >> of odd interactions between types (and possibly due to inline
>> >> functions in header files that use the types), but I think there's
>> >> enough here to be worth experimenting with.
>> > Sounds great! We will start to do some prototyping and will have
>> > a RFC diff for discussion once it is ready. Thanks!
>>
>> Please consider the point about __bpf__weaktype implies
>> preserve_access_index. __bpf_weaktype  without CO-RE relocation is
>> extremely confusing indeed, and not useful at all. Tying one to the
>> other also solves the problem I mentioned at the very beginning, that
>> I'd like vmlinux.h to have all the magic incantations for weaktype,
>> while user code would just do #include <vmlinux.h> and #include
>> <socket.h>, while making resolved types in socket.h automatically
>> CO-RE-relocatable.
>>
>>
>> >> Thanks!
>> >>
>> >> ~Aaron
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


More information about the cfe-dev mailing list