[llvm-dev] [cfe-dev] put "str" in __attribute__((annotate("str"))) to dwarf

David Blaikie via llvm-dev llvm-dev at lists.llvm.org
Tue Jun 15 11:03:11 PDT 2021


On Mon, Jun 14, 2021 at 11:23 PM Y Song <ys114321 at gmail.com> wrote:

> On Mon, Jun 14, 2021 at 10:54 PM Andrii Nakryiko
> <andrii.nakryiko at gmail.com> wrote:
> >
> > On Mon, Jun 14, 2021 at 8:30 PM David Blaikie <dblaikie at gmail.com>
> wrote:
> > >
> > >
> > >
> > > On Mon, Jun 14, 2021 at 7:52 PM Y Song <ys114321 at gmail.com> wrote:
> > >>
> > >> On Mon, Jun 14, 2021 at 6:44 PM David Blaikie <dblaikie at gmail.com>
> wrote:
> > >> >
> > >> >
> > >> >
> > >> > On Mon, Jun 14, 2021 at 4:54 PM David Rector <
> davrecthreads at gmail.com> wrote:
> > >> >>
> > >> >>
> > >> >>
> > >> >> On Jun 14, 2021, at 5:33 PM, Y Song via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
> > >> >>
> > >> >> On Mon, Jun 14, 2021 at 1:25 PM David Blaikie <dblaikie at gmail.com>
> wrote:
> > >> >>
> > >> >>
> > >> >>
> > >> >>
> > >> >> On Mon, Jun 14, 2021 at 12:25 PM Y Song <ys114321 at gmail.com>
> wrote:
> > >> >>
> > >> >>
> > >> >> On Fri, Jun 11, 2021 at 9:59 AM Alexei Starovoitov
> > >> >> <alexei.starovoitov at gmail.com> wrote:
> > >> >>
> > >> >>
> > >> >> On Fri, Jun 11, 2021 at 07:17:32AM -0400, Aaron Ballman wrote:
> > >> >>
> > >> >> On Thu, Jun 10, 2021 at 8:47 PM Alexei Starovoitov
> > >> >> <alexei.starovoitov at gmail.com> wrote:
> > >> >>
> > >> >>
> > >> >> On Thu, Jun 10, 2021 at 12:42 PM David Blaikie <dblaikie at gmail.com>
> wrote:
> > >> >>
> > >> >>
> > >> >>
> > >> >>
> > >> >> Any suggestions/preferences for the spelling, Aaron?
> > >> >>
> > >> >>
> > >> >> I don't know this domain particularly well, so takes these
> suggestions
> > >> >> with a giant grain of salt:
> > >> >>
> > >> >> If the concept is specific to DWARF and you don't think it'll need
> to
> > >> >> extend into other debug formats, you could go with
> `dwarf_annotate`.
> > >> >> If it's not really a DWARF thing but is more about B[P|T]F, then
> > >> >> `btf_annotate`  or `bpf_annotate` could work, but those may be a
> bit
> > >> >> mysterious to folks outside of the domain. If it's a generic debug
> > >> >> info concept, probably `debug_info_annotate` or something.
> > >> >>
> > >> >>
> > >> >>
> > >> >> Arguably it can/could be a generic debug info or dwarf thing, but
> for now we don't have any use for it other than to squirrel info along to
> BTF/BPF so I'm on the fence about which prefix to use exactly
> > >> >>
> > >> >>
> > >> >> A bit more bike shedding colors...
> > >> >>
> > >> >> The __rcu and __user annations might be used by the clang itself
> eventually.
> > >> >> Currently the "sparse" tool is doing this analysis and warns users
> > >> >> when __rcu pointer is incorrectly accessed in the kernel C code.
> > >> >> If clang can do that directly that could be a huge selling point
> > >> >> for folks to switch from gcc to clang for kernel builds.
> > >> >> The front-end would treat such annotations as arbitrary string, but
> > >> >> special "building-linux-kernel-pass" would interpret the
> semantical context.
> > >> >>
> > >> >>
> > >> >> Are __rcu and __user annotations notionally distinct things from
> bpf
> > >> >> (and perhaps each other as well)? Distinct enough that it would
> make
> > >> >> sense to use a different attribute name for user source for each
> need?
> > >> >> I suspect the answer is yes given that the existing annotations
> have
> > >> >> their own names which are distinct, but I don't know this domain
> > >> >> enough to be sure.
> > >> >>
> > >> >>
> > >> >> __rcu and __user don't overlap. __rcu is not a single annotation
> though.
> > >> >> It's a combination of annotations in pointers, functions, macros.
> > >> >> Some functions have:
> > >> >> __acquires(rcu)
> > >> >> another function might have:
> > >> >> __acquires(rcu_bh)
> > >> >> There are several flavors of the RCU in the kernel.
> > >> >> So single __attribute__((rcu_annotate("foo"))) won't work even
> within RCU scope.
> > >> >> But if we do:
> > >> >> struct foo {
> > >> >>  void * __attribute__((tag("ptr.rcu_bh")) ptr;
> > >> >> };
> > >> >> int bar(int) __attribute__((tag("acquires.rcu_bh")) { ... }
> > >> >> int baz(int) __attribute__((tag("releases.rcu_bh")) { ... }
> > >> >> int qux(int) __attribute__((tag("acquires.rcu_sched")) { ... }
> > >> >> ...
> > >> >> The clang pass can parse these strings and correlate one tag to
> another.
> > >> >> RCU flavors come and go, so clang cannot hard code the names.
> > >> >>
> > >> >>
> > >> >> Maybe we can name it as "bpf_tag" as it is a "tag" for "bpf" use
> case?
> > >> >>
> > >> >> David, in one of your early emails, you mentioned:
> > >> >>
> > >> >> ===
> > >> >> Arguably it can/could be a generic debug info or dwarf thing, but
> for
> > >> >> now we don't have any use for it other than to squirrel info along
> to
> > >> >> BTF/BPF so I'm on the fence about which prefix to use exactly
> > >> >> ===
> > >> >>
> > >> >> and suggests since it might be used in the future for non-bpf
> things,
> > >> >> maybe the name could be a little more generic then bpf-specific.
> > >> >>
> > >> >> Do you have any suggestions on what name to pick?
> > >> >>
> > >> >>
> > >> >>
> > >> >> Nah, not especially. bpf_tag sounds OK-ish to me if it suits you.
> > >> >>
> > >> >>
> > >> >>
> > >> >> The more generic the better IMO.  And, the less the need to parse
> string literals the better.
> > >> >>
> > >> >> Why not simply `__attribute__((debuginfo("arg1", "arg2", ...)))`,
> e.g.:
> > >> >>
> > >> >> ```
> > >> >> #define BPF_TAG(...) __attribute__((debuginfo("bpf", __VA_ARGS__)))
> > >> >> struct foo {
> > >> >>  void * BPF_TAG("ptr","rcu","bh") ptr;
> > >> >> };
> > >> >> #define BPF_RCU_TAG(PFX, ...) BPF(PFX, "rcu", __VA_ARGS__)
> > >> >> int bar(int) BPF_RCU_TAG("acquires","bh") { ... }
> > >> >> int baz(int) BPF_RCU_TAG("releases","bh") { ... }
> > >> >> int qux(int) BPF_RCU_TAG("acquires","sched") { ... }
> > >> >> ```
> > >> >
> > >> >
> > >> > Unless Paul & Adrian, etc chime in in agreement of a more general
> name, like 'debuginfo', I'm inclined to avoid that/go with something bpf
> specific until there's a broader use case/proposal, something we might be
> able to/want to encourage GCC to support too. Otherwise we're taking a
> pretty broad attribute name & choosing its behavior when we don't
> necessarily have a lot of leverage if GCC ends up using that name for
> something else.
> > >> >
> > >> > & as for separate strings - maybe, but I'm not sure what that'll
> look like in the resulting DWARF, what sort of form would you propose using
> to encode that? (same question below \/)
> > >> >
> > >> >>
> > >> >>
> > >> >> Sounds good. I will use "bpf_tag" as the starting point now.
> > >> >> Also, it is possible "bpf_tag" may appear multiple times for the
> same
> > >> >> function, declaration etc.
> > >> >>
> > >> >> For example,
> > >> >>  #define __bpf_tag(s) __attribute__((bpf_tag(s)))
> > >> >>  int g __bpf_tag("str1") __bpf_tag("str2");
> > >> >> Let us say we introduced a LLVM vendor tag DWARF_AT_LLVM_bpf_tag.
> > >> >>
> > >> >> How do you want the above to be represented in dwarf?
> > >> >>
> > >> >> My current scheme is to put all bpf_tag's in a string, separated
> by ",".
> > >> >> This will make things simpler. So the final output will be
> > >> >>     DWARF_AT_LLVM_bpf_tag  "str1,str2"
> > >> >> I may need to do a discussion with the kernel folks to use a
> different
> > >> >> delimiter than ",", but we still represent all tags with ONE
> string.
> > >> >>
> > >> >> But alternatively, it could be represented as a list of strings
> like
> > >> >>   DWARF_AT_LLVM_bpf_tag
> > >> >>             "str1"
> > >> >>             "str2"
> > >> >> is similar to DWARF_AT_location.
> > >> >
> > >> >
> > >> > What DWARF form were you thinking of using for this? There isn't a
> built in form that provides encoding for multiple delimited/separated
> strings that I know of.
> > >>
> > >> Actually I have not looked at the details on how to implement multiple
> > >> separated strings yet. Since you are mentioning there exists no such a
> > >> built-in form and the attribute is bpf specific, I will then just go
> > >> to one string only approach (e.g. "str1;str2" where ";" is the
> > >> delimiter). I just checked linux:include/linux/compiler_*.h, it is
> > >> possible "," may appear in some attributes, so I will use ";" as the
> > >> delimiter. Thanks for the clarification!
> > >
> > >
> > > Do you need to support multiple distinct __attribute__((XXX("stuff")))
> on one entity? If so, maybe it's worth considering how to encode them
> separately, rather than having the frontend have to concatenate them
> together?
>
> Typically linux kernel is to use macros to represent attributes, e.g.,
> in linux/include/linux/compiler_types.h, we have
>   # define __kernel       __attribute__((address_space(0)))
>   # define __user         __attribute__((noderef, address_space(__user)))
>   # define __iomem        __attribute__((noderef, address_space(__iomem)))
>   # define __percpu       __attribute__((noderef, address_space(__percpu)))
>   # define __rcu          __attribute__((noderef, address_space(__rcu)))
>
> drivers/scsi/arcmsr/arcmsr_hba.c:       struct MessageUnit_A __iomem
> *reg = acb->pmuA;
> drivers/scsi/arm/acornscsi.h:    void __iomem   *base;
>  /* memc base address                    */
> ...
>
> As you can see, the convention is to define attributes with some
> easy-to-understand macros.
>
> For bpf_tag attributes, most use cases will be a bunch of "# define"
> for a single property, e.g.,
> #define __property1 __attribute__((bpf_tag("str1")))
> #define __property2 __attribute__((bpf_tag("str2")))
> #define __property3(str) __attribute__((bpf_tag("info " # str)))
> int var1 __property1;
> int var2 __property2;
> int var3 __property1 __property2;
> int var4 __property1 __property3("lock");
>
> So yes, we do want to support multiple bpf_tag attributes on one
> entity. Looks like the easiest way
> for llvm internals and dwarf output is having one consolidated string
> (see below too).
>
> > >
> > > One option would be to support multiple of the same attribute on the
> DIE in question - though that's probably still difficult to encode in the
> LLVM IR metadata (we don't have any repeating fields in the LLVM IR debug
> info metadata) - which, maybe comes back to the idea of having the frontend
> concatenate all the attributes together with some separator like ";".
> >
> >
> > I'd prefer to not have to parse strings and rather have multiple
> > attributes individual "tag" attributes, but seems like DWARFv5
> > reference explicitly prohibits multiple tags of the same type under
> > single DIE:
> >
> >   2.2 Attribute Types
> >   Each attribute value is characterized by an attribute name. No more
> than one
> >   attribute with a given name may appear in any debugging information
> entry.
> >   There are no limitations on the ordering of attributes within a
> debugging
> >   information entry.
>

Ah, my mistake then - yeah, then we'd have to go to a custom form (super
expensive for a bunch of reasons) or a whole custom tag (possible - tags
can be repeated) which might be overkill, more complicated in the LLVM IR
metadata, etc.


>
> Thanks for the pointer. I also checked dwarf 5 spec
>     7.5.5 Classes and Forms
> and indeed, I didn't find a FORM to represent a list of strings.
> So it looks like one consolidated string for all bpf_tag attributes might
> be
> the easiest way to go.
>

Yeah, I think that's probably the simplest thing to do as a first pass -
single aggregate value (single string, using whatever format/separators
seem suitable) which goes on the DIWhatever things it needs to and gets
lowered to a custom bpf-specific attribute in the resulting DWARF.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210615/0c545237/attachment-0001.html>


More information about the llvm-dev mailing list