[PATCH] D103549: [POC] Put annotation strings into debuginfo.

Yonghong Song via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Jun 2 12:06:54 PDT 2021


yonghong-song created this revision.
yonghong-song added a project: debug-info.
Herald added subscribers: dexonsmith, pengfei, JDevlieghere, hiraditya, kristof.beyls, mgorny.
yonghong-song requested review of this revision.
Herald added projects: clang, LLVM.
Herald added subscribers: llvm-commits, cfe-commits.

This is a Proof-Of-Concept patch and intends to seek suggestions
from llvm community on how to put an attribute with arbitrary
string into the final debuginfo in the object file.

The Use Cases
=============

In BPF ecosystem, BTF is the debuginfo used for validation
and additional information.

  https://www.kernel.org/doc/html/latest/bpf/btf.html

Currently, BTF in vmlinux (x86_64, aarch64, etc.) are
generated by using pahole to convert dwarf to BTF and
vmlinux BTF is used to validate bpf program compliance,
e.g., bpf program signature must match kernel function
signature for certain tracing programs. Beyond signature
checking, the following are use cases which will further
help verifier.

1. annotation of "user"/"rcu" etc for function arguments, structure fields and global/static variables. Kernel currently uses `address_space` attributes for `sparse` tool. But we could like to carry this information to debuginfo. Previous attempt https://reviews.llvm.org/D69393 tries to use `address_space` which is halted as it needs to touch a lot of other llvm places.
2. annotation of functions. Currently, kernel tries to group them with separate logic, e.g., foo() __attribute__("property1", "property2") since the above attribute is not supported, kernel has to do some magic like global btf_property1: btf type id for foo, ... global btf_property2: btf type id for foo, ... this is really error prone as the function definition may be under some configs and the `global btf_property1 ...` may not even in the same source file as the function. Such a disconnect between function definition and function attributes already caused numerous issues.

  We also want to annotate functions with certain pre-conditions (e.g., a socket lock has been hold), as bpf programs has started to call kernel functions. Such annotations should be really directly applied to the function definition to avoid any potential later mismatch issues.
3. annotation of structures, e.g., if somehow this structure fields may have been randomized, verifier should know it as it cannot trust debuginfo structure layout any more.

Sorry for tense explanation of use cases. The main takeaway
is we want to annotate structure/field/func/argument/variable
with *arbitrary* strings and want such strings to be preserved
in final dwarf (or BTF) output.

An Example
==========

In this patch, I hacked clang Frontend to put annotations
in debuginfo and hacked llvm/CodeGen to "output" these
annotations into BTF. The target architecture is x86.
Note that I didn't really output these attributes to BTF yet.
I would like to seek llvm community advise first.

Below is an example to show what the source code looks like.
I am using "annotate" attribute as it accepts arbitrary strings.

  $ cat t1.c
  /* a pointer pointing to user memory */
  #define __user __attribute__((annotate("user")))
  /* a pointer protected by rcu */
  #define __rcu __attribute__((annotate("rcu")))
  /* the struct has some special property */
  #define __special_struct __attribute__((annotate("special_struct")))
  /* sock_lock is held for the function */
  #define __sock_lock_held __attribute((annotate("sock_lock_held")))
  /* the hash table element type is socket */
  #define __special_info __attribute__((annotate("elem_type:socket")))
  
  struct hlist_node;
  struct hlist_head {
    struct hlist_node *prev;
    struct hlist_node *next;
  } __special_struct;
  struct hlist {
     struct hlist_head head __special_info;
  };
  
  extern void bar(struct hlist *);
  int foo(struct hlist *h,  int *a __user, int *b __rcu) __sock_lock_held {
    bar(h);
    return *a + *b;
  }
  $ clang --target x86_64 -O2 -c -g t1.c
  TODO (BTF2Debug.cpp): Add func arg 'a' annotation 'user' to .BTF section
  TODO (BTF2Debug.cpp): Add func arg 'b' annotation 'rcu' to .BTF section
  TODO (BTF2Debug.cpp): Add subroutine 'foo' annotation 'sock_lock_held' to .BTF section
  TODO (BTF2Debug.cpp): Add field 'head' annotation 'elem_type:socket' to .BTF section
  TODO (BTF2Debug.cpp): Add struct 'hlist_head' annotation 'special_struct' to .BTF section
  $

What Is Next
============

First, using "annotate" attribute is not the best choice as I generated
extra globals and IRs. Maybe a different clang specific attribute?

Second, in the above example, I tried to put these attributes in BTF
as I researched and didn't find a way to put these attributes in dwarf.
Do we have a way to put it into dwarf? That works for us too.
Otherwise, we can let x86/arm64 etc. generates BTF (with a flag of course)
which will have these attribute information.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D103549

Files:
  clang/lib/CodeGen/CGDebugInfo.cpp
  llvm/include/llvm/IR/DebugInfoMetadata.h
  llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
  llvm/lib/CodeGen/AsmPrinter/BTF2Debug.cpp
  llvm/lib/CodeGen/AsmPrinter/BTF2Debug.h
  llvm/lib/CodeGen/AsmPrinter/CMakeLists.txt

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D103549.349344.patch
Type: text/x-patch
Size: 34250 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20210602/73d7b8b5/attachment-0001.bin>


More information about the cfe-commits mailing list