[llvm-dev] A Section-Metadata-Based Approach for Mapping Binary Addresses to Machine Basic Blocks

Fri Jul 17 13:30:12 PDT 2020

Greetings,

TLDR: We propose emitting a new section in the binary which would contain
information required to associate executable PC addresses to basic blocks.
The new proposal decouples the BB information from the symbol table,
allowing for more flexibility (stripping the section, adding new BB
information fields). Furthermore, the new approach cuts the binary size
overhead by ~60% compared to the current implementation (using special BB
symbols to pass the information).
Background

Today Propeller (proposed in
https://lists.llvm.org/pipermail/llvm-dev/2019-September/135393.html) uses
the *-fbasicblock-sections=labels* option to create a symbol for every
basic block to be able to map instruction addresses to individual basic
blocks. To lower the .strtab bloat, these symbols are encoded using a unary
naming scheme which allows for string compression. For instance, a function
“foo” with four basic blocks will have three basic block labels (the entry
block doesn’t need a BB label since it can be found using the function
symbol).

foo:

  …

  je ra.BB.foo

a.BB.foo:

  …

  je rra.BB.foo

ra.BB.foo:

  …

  ret

rra.BB.foo:

  …

  ret

While this serves the functionality of mapping addresses to basic blocks,
it poses several challenges:

   1.

   Unary symbol names are great for compression, but require changes to the
   demangler for readability.
   2.

   Stripping BB symbols from the binary requires special linker support as
   they are placed in the symbol table along with other symbols.
   3.

   Profilers and debuggers need to be changed to accommodate these symbols.
   For instance, the profiles from all BBs of a function must be aggregated
   and the debugger must be able to map to the function symbol, rather than
   the BB symbol, to show a function-level backtrace.
   4.

   Relying on the ELF symbol format provides little flexibility for
   extension of BB information for passing other information about basic
   blocks (e.g., whether this is a return block, or an exception handling
   block). Today Propeller encodes this information using special characters
   in the symbol name (‘r’ for return blocks as in the above example).

The BB Info Section Design

To solve these problems, we propose emitting the BB information in a
separate “.bb_info” section. Each function will have its own BB info table
emitted into this section. The structure of this table is as follows. The
table has a header which includes the address of the function and the
number of basic blocks in that function.

+-----------------------------------------------+

|  Address of the function                      |   -> 8 bytes (pointer
size)

+-----------------------------------------------+

|  Number of basic blocks in this function (>0) |   -> ULEB128

+-----------------------------------------------+

The header is followed by the BB information record for every basic block.
These records are ordered in the same order as MBBs are placed in the
function. Each BB Info is structured as follows:

+-------------------------------------------------------+

|  Offset of the basic block relative to function begin |   -> ULEB128

+-------------------------------------------------------+

|  Binary size of the basic block                       |   -> ULEB128

+-------------------------------------------------------+

|  BB metadata                                          |

| [MBB.isReturn() | MBB.hasTailCall() << 1  | MBB.isEHPad() << 2 ]  |  ->
ULEB128

+-------------------------------------------------------+

The BB record for the entry basic block excludes the offset field, as it is
always zero.

Along with this information, Propeller needs the corresponding MBB number
of every BB info record. Instead of emitting another field in the record,
we implicitly derive the index assuming that MBBs are renumbered from 0 to
N-1 prior to bb-info section emission.

Remapping PC addresses to BBs using the .bb_info section

Propeller can use the BB Info table, along with function addresses to map
sampled PCs to their corresponding basic blocks. Given an address X, the
workflow is as follows:

   1.

   Use the symbol table to find the function F containing X.
   2.

   Lookup the BB info table for function F.
   3.

   Use the offset fields of the BB info table to find the BB which contains
   address X.

Complexity

All steps can use binary search.

   1.

   Step 1 uses a binary search into the symbol table and has time
   complexity of O(lg N), with N being the number of symbols in the binary.
   2.

   Step 2 also uses binary search to find the BB info table corresponding
   to a function address and it has time complexity of O(lg M), with M being
   the number of functions.
   3.

   Finally, step 3 uses a binary search into the BB info table to find the
   offset containing the address offset relative to the function entry and has
   time complexity of O(lg N_F), with N_F being the number of basic blocks in
   function F.

Advantage over the BB symbols approach

The section-based approach solves the problems mentioned above by
decoupling the BB information from the symbol table. Furthermore, the new
approach significantly reduces the binary and object size overheads. The
following table lists the total binary size of Clang for vanilla,
BB-symbols, and BB-info builds, all using PGO.

Flavor

Binary size

PGO

86MB

PGO with BB Info section

97MB

PGO with BB symbols

146MB

We see that using the .bb_info section increases the binary size by about
13% while BB symbols (with strtab compression) amounts to about 70% binary
size (60MBs) overhead. Most of this overhead comes from the .symtab bloat
(56MBs). The .strtab bloat is lower (4MB) due to string compression.

Implementation and challenges

Our implementation of the .bb_info section is inspired by the .stack_sizes
section feature in LLVM. Each function will have its BB info table emitted
into a section which is linked to the section containing that function (The
BB info section uses ELF:SHF_LINK_ORDER and is linked to the beginning
symbol of the section containing that function). With -function-sections this
emits a unique .bb_info section for each function (we don’t need a unique
id for sections since they are  already distinguished by the linked-to
symbol). Eventually, these sections are concatenated by the linker in an
order consistent with how their function sections are laid out.

Several linker and compiler features may affect the .bb_info section:

   1.

   COMDATs: For COMDAT functions, we add their .bb_info sections to the
   COMDAT group too, so that conflicts between .bb_info sections of the
   COMDAT group are resolved the same way as it is resolved for the function
   sections. Besides this, since we make the BB info table header of every
   function refer not to the global function symbol, but to the local
   .Lfunc_begin symbol, the treatment of the bb info section for every
   function is independent of symbol resolution for COMDAT functions (This is
   the same as how stack_sizes section is emitted:
   https://reviews.llvm.org/D44669).

   1.

   Garbage Collection: If a function section is garbage-collected by the
   linker its .bb_info section must be removed as well, since the section
   only refers to that particular function.

   1.

   Code Folding: If two functions are identical, their section is folded by
   identical code folding and their .bb_info sections must be folded as
   well. This is a safe assumption as the .bb_info sections should be
   identical as well, except for the function symbol that they refer to, and
   ICF can correctly identify that those function symbols are folded together.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200717/137e6d32/attachment.html>