[llvm-dev] [RFC] Building LLVM-Debuginfod

Mon Sep 27 06:45:00 PDT 2021

Debuginfod is a simple protocol
<https://sourceware.org/elfutils/Debuginfod.html> allowing a client to
fetch debug information about a stripped binary from a server by supplying
its build-id. We would like to add debuginfod support to LLVM to make it
easier for users to debug stripped binaries.

In summary, we would like to:

   1.

   Build an LLVM debuginfod client library in LLVM.
   2.

   Integrate the client library into llvm-symbolizer and other tools.
   3.

   Build a simple LLVM debuginfod server library for local development use.

The client and server tasks are separate and we have already begun work on
the client side. The server is nothing more than a simple web server that
serves static content associated with the build_id in the URL. We are eager
to get feedback and advice from the community.
HTTP Client

The debuginfod client library will require an HTTP client, which should be
fully featured to maximize compatibility across various network
configurations and debuginfo server implementations. We were leaning
towards libcurl <https://curl.se/libcurl/> which is widely available.
HTTP Server

The debuginfod server is essentially a static file server using HTTP over
TCP/IP. We propose to use the TCPSocket library from lldb-server
<https://lldb.llvm.org/use/remote.html> and build a very small HTTP file
server on top using LLVM regular expressions
<https://llvm.org/doxygen/classllvm_1_1Regex.html>. We built a
proof-of-concept, more details can be found below under HTTP Server
Benchmarks. This would require a refactor of the LLDB IOObject hierarchy
into LLVM as described in our previous RFC
<https://groups.google.com/g/llvm-dev/c/IbvheetpGKs>.

As a short-term solution, since the refactor may take some time, we propose
to use the lightweight cpp-httplib library to get a server up and running.
Database

Debuginfod needs to run key-value lookup queries to serve requests.
Specifically, it must implement these logical mappings:

   -

   build_id -> executable_path
   -

   (build_id, source_path) -> source_path

In pursuit of a mechanism to perform these lookups, we can draw inspiration
from clangd's indexing scheme <https://clangd.llvm.org/design/indexing.html>
and Apple’s index-while-building feature
<https://docs.google.com/document/d/1cH2sTpgSnJZCkZtJl1aY-rzy4uGPcrI-6RrUpdATO2Q/edit>.
Both of these implementations ultimately solve the same problem, supporting
fast lookups of e.g. symbol occurrences in source code, but they take
different approaches. Clangd produces indexes in memory and then writes
them to disk in a way that enables efficient queries on-demand. Apple’s
index-while-building libraries instead dump unindexed record and unit files
to a directory using the LLVM bitstream file format. These files must then
be indexed into lookup tables to answer queries on-demand – a task left to
other tooling. For example, Apple’s IndexStoreDB
<https://github.com/apple/indexstore-db> tool ingests the record and unit
files to produce key-value tables stored on-disk with LMDB
<http://www.lmdb.tech/doc/>.

To estimate the size of the database needed for a local debuginfo server,
we indexed the binaries of the LLVM project (compiled with standard Debug
mode options) using elfutils-debuginfod. The resulting SQLite database
occupied just 35MB on disk, which could easily fit in a typical developer’s
RAM. Of course, production debuginfo servers may index far more binaries,
and the database could easily reach several GB.

Based on the small size required for a local debuginfod database, we think
it should be fine to take the clangd-style approach out-of-the-box, while
providing an interface compatible with LMDB (or LevelDB, etc.) to assist
with production debuginfod servers. Specifically, when built without LMDB,
llvm-debuginfod would index debuginfo using a few in-memory hash tables,
then atomically serialize them to LLVM’s OnDiskHashTable
<https://github.com/llvm/llvm-project/blob/d480f968ad8b56d3ee4a6b6df5532d485b0ad01e/llvm/include/llvm/Support/OnDiskHashTable.h>
(or subclasses). We would leave LMDB libraries as an optional dependency
that could be used for production debuginfod servers.
Build system integration

llvm-debuginfod would be built as an implicit project similar to
llvm-objdump and located at ./llvm/tools/llvm-debuginfod.

The client introduces a mandatory dependency on libcurl. This could be
handled similarly to libxml in the llvm/CMakeLists.txt.

At first, the server will introduce a mandatory dependency on cpp-httplib
until we can complete the refactor to move LLDB’s TCPSocket into LLVM.
After the refactor the server will have no additional mandatory
dependencies. The server will have optional dependencies on both
cpp-httplib (or another HTTP server library) and LMDB. These optional
dependencies would be intended for production debuginfod servers.
Client integration with other LLVM tools

The llvm-debuginfod server itself will require client functionality
provided by a llvm-libdebuginfod-client library for federation to other
debuginfo servers. We would wrap this library with a thin
llvm-debuginfod-find tool in the same source directory with functionality
similar to the debuginfod-find tool in elfutils. The library will make it
simpler to bring the fetch capabilities to other tools like llvm-symbolizer
in a nearly-uniform way. Specifically, the llvm-libdebuginfod-client
library code will query the servers in the DEBUGINFOD_URLS environment
variable for a given build_id. This client would be invoked from tools like
llvm-symbolizer by adding a small (~10 line) change to the tool. The needed
changes may be slightly different for each tool due to the different ways
that debuginfo are found and handled. This makes it less of a “black-box”
replacement than FUSE-based approaches <https://github.com/edolstra/dwarffs>,
but leaves tool designers more flexibility to select if / when / how
debuginfod servers are queried.
Implementation Schedule

We propose to split the implementation of llvm-debuginfod into 4 patches.

   -

   Patch 1 will add the HTTP client dependency (libcurl) to the build
   system.
   -

   Patch 2 will create the client library llvm-debuginfod-client and
   implement llvm-debuginfod-find. Public debuginfo servers and/or a local
   python http.server (f.k.a. SimpleHTTPServer) could be used for the initial
   test suite.
   -

   Patch 3 will make the changes to llvm-symbolizer needed for it to use
   the debuginfod-find library.
   -

   Patch 4 will implement debuginfod (the server).

These patches have the following logical dependencies:

1-->2-->3

     \

      ->4

Why another implementation of debuginfod?

Beyond the licensing aspect, there are benefits of multiple
cross-compatible implementations of the debuginfo client and server., There
is also an opportunity to optimize performance and add new features beyond
elfutils’ debuginfod. For example, we could wrap the client with LLVM’s FS
caching, fix issues such as infinite traversal when following symlinks, and
extend support to Mach-O and other binary formats.

HTTP Server Benchmarks

The debuginfod HTTP server is essentially a static file server. We expect
that users who run production debuginfo servers will implement some
advanced / customized configuration with load balancing etc. We wish to
target developers who will want something simple that works “out of the
box” on their local machine. With this user in mind, we should minimize
dependencies and maintain good perf for even large (say, 5GB) debuginfo.

To help pick an HTTP server, I have run preliminary tests comparing the
following:

   1.

   libmicrohttpd <https://www.gnu.org/software/libmicrohttpd/>
   2.

   nginx <https://github.com/nginx/nginx>
   3.

   h2o <https://github.com/h2o/h2o>
   4.

   cpp-httplib <https://github.com/yhirose/cpp-httplib>
   5.

   “llvm-httpd <https://user.git.corp.google.com/shutty/llvm-http/>”,
   explained below.

These servers were chosen to provide representatives from across the
spectrum of candidate libraries. nginx and h2o are highly-optimized and
fully-featured configurable standalone webservers. libmicrohttpd is a GNU
library used by elfutils-debuginfod. cpp-httplib is a small single-header
client-server library. Finally, “llvm-httpd” is a custom barebones 100-line
HTTP fileserver using the posix socket and extended regex interfaces. This
is intended as a stand-in for a small custom solution built on the LLDB
TCPSocket library and LLVM regex library.

In the benchmarking test, a single 5.1GB randomly generated file is fetched
using curl sequentially 100 times from localhost. h2o was configured to
serve static files from a directory. cpp-httplib was configured with a
custom file streaming response handler that I wrote because the library’s
default static file serving code is not optimized for large files.

time (s) to download 5.1GB file, 100 trials

libmicrohttpd

nginx

h2o

cpp-httplib

custom llvm-httpd

average

2.210117216

2.309044575

2.554523523

2.934914591

2.941232402

max

2.581766925

2.555327036

2.68060518

3.820860995

3.133437672

min

2.076668873

2.217972741

2.472362859

2.729681188

2.787481511

standard deviation

0.1216443211

0.04583248288

0.03125102196

0.1737419678

0.09295689991

As you can see from the table above, the custom ‘llvm-httpd’ is only 33%
slower on average than the fastest server, libmicrohttpd.

Based on this outcome, we propose to proceed as follows:

   -

   Move the LLDB IOObject hierarchy code to LLVM itself as explained
in the previous
   RFC <https://groups.google.com/g/llvm-dev/c/IbvheetpGKs>.
   -

      We note that Clangd remote is being built on gRPC, which provides a
      TCP server and HTTP 2.0 client / server library. However we do
not plan to
      use the gRPC protocol so we were thinking it would be better to use the
      lightweight TCPSocket implementation which is already in the
LLDB codebase.
      -

   Implement llvm-debuginfod using cpp-httplib at first, then switch to the
   LLDB TCPSocket code after the refactor to eliminate cpp-httplib dependency.
   -

   Leave the interface sufficiently modular to enable swapping out with a
   more featureful HTTP server library if needed.

Noah
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210927/c51b24dd/attachment-0001.html>