[cfe-dev] emit annotate attribute strings to final debuginfo in object file

Y Song via cfe-dev cfe-dev at lists.llvm.org
Wed Jun 2 12:25:14 PDT 2021


Hi,

This is to seek advice whether and how we could put annotate attribute
strings to final debuginfo in object file. I have implemented a POC
patch to illustrate what we really want. The patch is here
https://reviews.llvm.org/D103549. The use case is for vmlinux BTF.
Currently vmlinux BTF is generated by pahole by converting vmlinux
dwarf to BTF. The architecture is x86, x86_64, arm64, ppc, etc.

The following are detailed explanations for use cases, a concrete
source example.
I didn't find an easy way to pass such annotate attribute strings to
dwarf. Maybe I am wrong and happy to hear suggestions. Also I "abused"
annotate attributes here, maybe we could have a different one?

The Use Cases
============

In BPF ecosystem, BTF is the debuginfo used for validation
and additional information.

https://www.kernel.org/doc/html/latest/bpf/btf.html

Currently, BTF in vmlinux (x86_64, aarch64, etc.) are
generated by using pahole to convert dwarf to BTF and
vmlinux BTF is used to validate bpf program compliance,
e.g., bpf program signature must match kernel function
signature for certain tracing programs. vmlinux BTF is also
used for relocation as its structure layout information is
considered as the ultimate truth of the running system.
Beyond such and other usages, the following are use cases
which will further help verifier.

Annotation of "user"/"rcu" etc for function arguments, structure
fields and global/static variables. Kernel currently uses
address_space attributes for sparse tool. But we would like to carry
this information to debuginfo. Previous attempt
https://reviews.llvm.org/D69393 tries to use address_space which is
halted as it needs to touch a lot of other llvm places.

Annotation of functions. Currently, kernel tries to group them with
separate logic, e.g., foo() attribute("property1", "property2") since
the above attribute is not supported, kernel has to do some magic like
global btf_property1: btf type id for foo, ... global btf_property2:
btf type id for foo, ... this is really error prone as the function
definition may be under some configs and the global btf_property1 ...
may not even be in the same source file as the function. Such a
disconnect between function definition and function attributes already
caused numerous issues.

We also want to annotate functions with certain pre-conditions (e.g.,
a socket lock has been held), as bpf programs have started to call
kernel functions. Such annotations should be really directly applied
to the function definition to avoid any potential later mismatch
issues.
annotation of structures, e.g., if somehow these structure fields may
have been randomized, verifier should know it as it cannot trust
debuginfo structure layout any more.

Sorry for the tense explanation of use cases. The main takeaway
is we want to annotate structure/field/func/argument/variable
with *arbitrary* strings and want such strings to be preserved
in the final dwarf (or BTF) output.

An Example
=========

In this patch, I hacked clang Frontend to put annotations
in debuginfo and hacked llvm/CodeGen to "output" these
annotations into BTF. The target architecture is x86.
Note that I didn't really output these attributes to BTF yet.
I would like to seek llvm community advice first.

Below is an example to show what the source code looks like.
I am using the "annotate" attribute as it accepts arbitrary strings.

$ cat t1.c
/* a pointer pointing to user memory */
#define __user __attribute__((annotate("user")))
/* a pointer protected by rcu */
#define __rcu __attribute__((annotate("rcu")))
/* the struct has some special property */
#define __special_struct __attribute__((annotate("special_struct")))
/* sock_lock is held for the function */
#define __sock_lock_held __attribute((annotate("sock_lock_held")))
/* the hash table element type is socket */
#define __special_info __attribute__((annotate("elem_type:socket")))

struct hlist_node;
struct hlist_head {
  struct hlist_node *prev;
  struct hlist_node *next;
} __special_struct;
struct hlist {
   struct hlist_head head __special_info;
};

extern void bar(struct hlist *);
int foo(struct hlist *h,  int *a __user, int *b __rcu) __sock_lock_held {
  bar(h);
  return *a + *b;
}
$ clang --target x86_64 -O2 -c -g t1.c
TODO (BTF2Debug.cpp): Add func arg 'a' annotation 'user' to .BTF section
TODO (BTF2Debug.cpp): Add func arg 'b' annotation 'rcu' to .BTF section
TODO (BTF2Debug.cpp): Add subroutine 'foo' annotation 'sock_lock_held'
to .BTF section
TODO (BTF2Debug.cpp): Add field 'head' annotation 'elem_type:socket'
to .BTF section
TODO (BTF2Debug.cpp): Add struct 'hlist_head' annotation
'special_struct' to .BTF section
$

What Is Next
==========

First, using the "annotate" attribute is not the best choice as I generated
extra globals and IRs. Maybe a different clang specific attribute?

Second, in the above example, I tried to put these attributes in BTF
as I researched and didn't find a way to put these attributes in dwarf.
Do we have a way to put it into dwarf? That works for us too.
Otherwise, we can let x86/arm64 etc. generate BTF (with a flag of course)
which will have these attribute information.


More information about the cfe-dev mailing list