[PATCH] D70696: [DebugInfo] Support to emit debugInfo for extern variables

Tue Dec 3 17:42:09 PST 2019

yonghong-song added a comment.

In D70696#1767616 <https://reviews.llvm.org/D70696#1767616>, @dblaikie wrote:

> Many of the test cases could be collapsed into one file - using different variables that are used, unused, locally or globally declared, etc.

Okay. Will try to consolidate into one or fewer files. Originally, I am using different files to avoid cases where in the future clang may generate different ordering w.r.t. different global variables.

> Is this new code only active for C compilations? (does clang reject requests for the bpf target when the input is C++?) I ask due to the concerns around globals used in inline functions where the inline function is unused - though C has inline functions too, so I guess the question stands: Is that a problem? What happens?

Currently, yes. my implementation only active for C compilation.
In the kernel documentation (https://www.kernel.org/doc/Documentation/networking/filter.txt), we have:

  The new instruction set was originally designed with the possible goal in
  mind to write programs in "restricted C" and compile into eBPF with a optional
  GCC/LLVM backend, so that it can just-in-time map to modern 64-bit CPUs with
  minimal performance overhead over two steps, that is, C -> eBPF -> native code.

For LLVM itself, people can compile a C++ program into BPF target. But "officially" we do not
support this. That is why I restricted to C only. For C++ programs, we don't get much usage/tests
from users.

Do you have a concrete example for this? I tried the following:

  -bash-4.4$ cat t.h
  inline int foo() { extern int g; return g; }
  -bash-4.4$ cat t.c
  int bar() { return 0; }
  -bash-4.4$ clang -target bpf -g -O0 -S -emit-llvm t.c

`foo` is not used, clang seems smart enough to deduce `g` is not used, so no debuginfo is emitted in this case.

In general, if an inline function is not used but an external variable is used inside that inline function, the worst case is extra debuginfo for that external variable. Since it is not used, it won't impact bpf loader.

> Should this be driven by a lower level of code generation - ie: is it OK to only produce debug info descriptions for variables that are referenced in the resulting LLVM IR? (compile time constants wouldn't be described then, for instance - since they won't be code generated, loaded from memory, etc)

Yes, it is OK to only produce debug info only for variables that are referenced in the resulting LLVM IR. But we are discussing extern variables and no compile time constants here. Maybe I miss something?

> Is there somewhere I should be reading about the design requirement for these global variable descriptions to understand the justification for them & the ramifications if there are bugs that cause them not to be emitted?

We do not have design documents yet. The following are two links and I can explain more:

1. https://lore.kernel.org/bpf/CAEf4BzYCNo5GeVGMhp3fhysQ=_axAf=23PtwaZs-yAyafmXC9g@mail.gmail.com/T/#t

The typical config is at /boot/config-<...> in a linux machine. The config entry typically look like:

  CONFIG_CC_IS_GCC=y
  CONFIG_GCC_VERSION=40805
  CONFIG_INITRAMFS_SOURCE=""

Suppose a bpf program wants to check config value and based on its value to do something, user can write:

  extern bool CONFIG_CC_IS_GCC;
  extern int CONFIG_GCC_VERSION;
  extern char CONFIG_INITRAMFS_SOURCE[20];
  ...
  if (CONFIG_CC_IS_GCC) ...
  map_val = CONFIG_GCC_VERSION; 
  __builtin_memcpy(map_value, 8, CONFIG_INITRAMFS_SOURCE);

bpfloader will create a data section store all the above info and patch the correct address to the code.
Without extern var type info, it becomes a guess game what type/size the user is using.
Based on precise type information, bpf loader is able to do relocation much easily.

2. https://lore.kernel.org/bpf/87eez4odqp.fsf@toke.dk/T/#m8d5c3e87ffe7f2764e02d722cb0d8cbc136880ed

This is for bpf program verification.
For example,
bpf_prog1:

  foo(...) {
    ... x ... y ...
    z =  bar(x /*struct t * */, y /* int */);
    ...
  }

and there is no bar body available yet.
The kernel verifier still able to verify program "foo"
and makes sure type leading to bar for all parameters
are correct.

Later, if there is a program
prog2(struct t *a, int b)
which is verified independently.

The in kernel, prog1 can call prog2 if there parameter types
and return types match. This is the BPF-way dynamic linking.
The types for external used functions can help cut down
verification cost at linking time.

If there is no debug information for these extern variables, the current
proposal is to fail the bpf loader and verifier. User can always workaround
such issues to create bpf maps for the first use case (which is more expensive and not user friendly) and do static
linking before loading into the kernel for the second use case.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70696/new/

https://reviews.llvm.org/D70696