[llvm-dev] [RFC] Adding Binary ID into LLVM Profiles

Gulfem Savrun Yeniceri via llvm-dev llvm-dev at lists.llvm.org
Mon Jun 14 17:47:28 PDT 2021


There is no direct way of associating binaries with the corresponding
profiles in LLVM. Therefore, source code coverage processing requires an
additional post-processing step to match the executables to their
associated profiles. In order to improve it, we propose embedding binary
IDs into profiles, so that we can uniquely identify a profile and easily
find the relevant binary.

Binary ID

We use the name binary ID to refer to the unique identifiers used in
binaries in different file formats. Build ID
<https://fedoraproject.org/wiki/Releases/FeatureBuildId> is a unique
identifier for the build that is included in the ELF file format. It was
originally introduced in GNU, and is used for various purposes, such as
assoicating binaries with core dumps. Build ID is optional, and can be
enabled by using -Wl,--build-id options. To the best of our knowledge,
similar unique identifiers are used in different file formats. For example,
a unique identifier called LC_UUID is used in Mach-O, and similarly
GUID (Globally
Unique Identifier) is used in COFF.


Clang supports profiling with instrumentation
for two main purposes:


   Front-end instrumentation, where the compiler front-end inserts
   instrumentation for collecting source code coverage.

   IR-level instrumentation, where LLVM inserts instrumentation during
   optimizations for PGO (Profile-Guided Optimization).

Profiling inserts instrumentation code into binaries, which will be
used by compiler_rt
(compiler runtime) during execution. When the instrumented binary executes,
it will write a raw profile (.profraw). Multiple raw profiles are merged
together by using llvm-profdata
<https://llvm.org/docs/CommandGuide/llvm-profdata.html> tool. At the end, a
single indexed profile is created (.profdata) that is used to generate
source code coverage reports.

Profile format consists of two major parts:


   Profile header includes version, magic (and paddings and sizes of each
   section in raw profile).

   Profile data includes function name and hash, and pointers to three
   sections: counters, names and value profiling counters per function.


We propose adding build ID, which is the unique binary ID in ELF, into
profiles to improve source-code coverage post-processing step. Although we
target ELF file format, we are proposing a design that can be leveraged and
extended for other file formats, such as Mach-O and COFF.
Extending profile format

We need to extend the both raw and indexed profile format to include build
ID. Since build ID does not have a fixed length,  we will add a
variable-length byte array at the end of profile formats. We will also
change the compiler-rt profiling runtime for ELF platforms to read build IDs
from ELF data in memory and write them into the raw profile.
Extending profiling tools

Since the profile format changes, we also need to extend the tools that
process profiles. We need to extend the ProfileData library functions
that llvm-profdata
tool uses to operate on profiles, and add support for printing binary ids
in the profiles.

Future Work

Embedding binary ids into profiles would also enable implementing support
for debuginfod <https://sourceware.org/elfutils/Debuginfod.html> library in
llvm-cov <https://lists.llvm.org/pipermail/llvm-dev/2020-August/144708.html>,
where the tool will automatically download binaries corresponding to input


- https://fedoraproject.org/wiki/Releases/FeatureBuildId


- https://llvm.org/docs/CommandGuide/llvm-profdata.html

- https://lists.llvm.org/pipermail/llvm-dev/2020-August/144708.html

- https://sourceware.org/elfutils/Debuginfod.html

Please let us know if you have any suggestions or questions.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210614/59a7cfeb/attachment.html>

More information about the llvm-dev mailing list