<div dir="ltr">Hi all,<br><br>Please see the updates to the RFC below.<br><br>The terms “type hash”, “type identifier”, and “type id” are used <br>interchangeably.<br><br>The original RFC describes a use case, which is efficient stack trace<br>collection and reconstruction via offline reverse call graph<br>traversal. Besides call graph storing and reconstruction as described<br>in this RFC, additional LLVM features will be introduced for this use<br>case including 1) ASan features to collect and report compressed stack<br>traces, 2) a stand-alone tool to reconstruct stack traces based on the<br>compressed trace and the call graph. A separate RFC will be sent<br>shortly for these.<br><br>Updates<br>=======<br>  - Added implementation details on computing and storing call graph <br>    (see “Implementation” section).<br>  - Added details on restoring the call graph (see “Updates on <br>    restoring the call graph” section).  Proposed to introduce an <br>    option to llvm-objdump (–call-graph-info) instead of having a new <br>    stand-alone tool.  Added the output layout.<br><br><br>Implementation<br>==============<br><br>The construction scheme will consist of the following parts:<br><br>1. Creating type ids for indirect calls and targets at clang front-end<br><br>    Generalized type identifiers will be computed for each indirect<br>    call and target using<br>    clang::CodeGenModule::CreateMetadataIdentifierGeneralized().<br>    These type ids will be passed to the middle-end as metadata.  The<br>    type information is deliberately computed at the front-end from<br>    C/C++ types instead of IR types due to two main reasons.  First,<br>    it provides extra precision since some type information is lost at<br>    IR.  Second, it ensures consistent type identifiers between object<br>    files compiled at different times since C/C++ standards require<br>    language-level types to match.<br><br><br>2. Propagating indirect call type ids from the middle-end (LLVM IR) to<br>   the back-end (machine instructions at AsmPrinter) with CallSiteInfo<br><br>    To form the call graph section, the type ids for indirect calls<br>    and targets are needed at the machine-instruction level.  However,<br>    the metadata is not propagated from LLVM IR to machine<br>    instructions.  For indirect targets, machine functions can be<br>    mapped back to IR functions.  This enables retrieving the type id<br>    for a machine function from the metadata of the IR function.<br>    However, there is no such machine-to-IR mapping for arbitrary<br>    instructions.  To address this,  CallSiteInfo (originally used for<br>    mapping call arguments to forwarding registers) will be extended<br>    to include call type ids, so that, they can be retrieved at the<br>    back-end for indirect calls.<br><br>3. Creating comdats for functions whose symbols will be referenced<br>   from the call graph section<br><br>    Comdats will be used to enable dead-stripping of the call graph<br>    entries.  An LLVM pass will be implemented and enabled to create a<br>    comdat for each function with at least one call graph entry.  The<br>    entries of each function will be emitted to the section created<br>    using its comdat.  As a result, the call graph entries will be<br>    discarded if the linked functions get removed.<br><br>4. Emitting call graph section<br><br>    The type ids will be retrieved and the call graph sections will be<br>    created at the assembly printer as described above.  A special<br>    type id (0) will be assigned to indirect targets lacking the<br>    metadata (possibly due to metadata loss), so that a complete list<br>    of indirect targets can be recovered from the call graph section.<br>    This makes indirect targets distinguishable from others, which<br>    provides additional precision.  Labels will be emitted for<br>    indirect targets and calls, and their values will be emitted to<br>    the call graph section as pointers to these labels.  Gathering all<br>    these pieces, the call graph sections will be constructed at the<br>    assembly printer following the described layout.<br><br><br><br>Updates on restoring the call graph<br>===================================<br><br>Instead of creating a new stand-alone LLVM tool, a new feature will be<br>introduced to llvm-objdump (--call-graph-info) to extract and<br>serialize the call graph from the binary.  This will enable code reuse<br>for disassembly.<br><br>The output will have the following layout:<br><br>  INDIRECT TARGETS TYPES<br>  TypeId1 FuncEntryAddr1 FuncEntryAddr2 ... FuncEntryAddrN<br>  TypeId2 ...<br><br>  INDIRECT CALLS TYPES<br>  TypeId1 ICallSite1 ICallSite2 ... ICallSiteN<br>  TypeId2 ...<br><br>  INDIRECT CALL SITES<br>  CallerFuncEntryAddr1 ICallSite1 ICallSite2 ... ICallSiteN<br>  CallerFuncEntryAddr2 ...<br><br>  DIRECT CALL SITES<br>  CallerFuncEntryAddr1 DCallSite1 DCallTarget1 ... DCallSiteN DCallTargetN<br>  CallerFuncEntryAddr2 ...<br><br>  FUNCTION SYMBOLS<br>  FuncEntryAddr1 FuncName1<br>  FuncEntryAddr2 FuncName2<br>  ...<br><br>The information for the indirect target and call type identifiers will<br>be retrieved from the call graph section. The call sites will be<br>extracted from the disassembly. The function symbols will be extracted<br>from the symbol table.<br><br>If an indirect call has no type id, it will still appear in “INDIRECT<br>CALL SITES” list but not in  “INDIRECT CALL TYPES” list.  In such<br>cases, receiver edges can be drawn to all indirect targets from such<br>call sites for completeness of the call graph, <br><br>Indirect targets without a type id are assigned a special id (0) in<br>the call graph section construction.  For completeness, these<br>functions can be taken as receiver to any indirect call.<div><br></div><div>Best,</div><div>Necip Fazil Yildiran<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Jun 10, 2021 at 3:48 PM Necip Yildiran <<a href="mailto:necip@google.com">necip@google.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi all,<br>

<br>

Please find the RFC for computing, storing, and restoring conservative<br>

call graphs with LLVM below.<br>

<br>

This is an early version of the proposed feature, and we are still<br>

exploring the ways to implement it.  Comments and suggestions are<br>

welcome.<br>

<br>

<br>

Summary<br>

=======<br>

<br>

Inferring indirect call targets from a binary is challenging without<br>

source-level information.  Hence, reconstruction of a fine-grained<br>

call graph from the binary is unfeasible for indirect/virtual calls.<br>

To address this, we propose to implement a feature in LLVM to collect<br>

the necessary information to construct the call graph later while the<br>

source information is present (i.e., during compilation), and store it<br>

in a non-code section of the binary.  Additionally, we propose to<br>

implement a stand-alone tool, llvm-discg, or a feature in llvm-objdump<br>

to extract and serialize the call graph.  Together, these will allow<br>

recovering a conservative (i.e., possibly bigger than actual to avoid<br>

missing information) and high-precision call graph only from the<br>

binary.<br>

<br>

In the following, we discuss the motivation for the proposed changes,<br>

and describe how we plan to construct, store, and reconstruct the call<br>

graph.<br>

<br>

<br>

Background and motivation<br>

=========================<br>

<br>

Source code provides rich semantic information such as types, which is<br>

mostly lost when it is compiled to a binary.  A coarse-grained call<br>

graph can be recovered when indirect calls are allowed to target any<br>

function in the program, resulting in poor precision.  On the other<br>

hand, source-level information can easily increase the precision of<br>

the call graphs, e.g., by restricting the target of an indirect call<br>

to functions with matching type.<br>

<br>

It is infeasible to fully reconstruct the call graph for the binary<br>

even with the source in hand.  The call graph for the source is<br>

possibly different than for the binary due to removed or inserted<br>

function calls, e.g., tail call optimizations or new function calls to<br>

libc.  Therefore, construction of a call graph during compilation is<br>

favorable.  Since the call graph belongs to a specific compilation of<br>

the source, a non-code section in the binary is the right scope to<br>

store such information.<br>

<br>

<br>

Use case<br>

========<br>

<br>

Many program analysis tools (bug detectors, profilers, etc.) collect<br>

call stack traces during execution, to later present them in<br>

diagnostic reports.  In some cases, like with AddressSanitizer [1],<br>

stack trace collection is not a performance critical operation since<br>

the rest of the tool’s overhead is much higher.  In other cases, such<br>

as statistical heap profilers, it is also not a performance bottleneck<br>

since the profilers sample a small fraction of events.<br>

<br>

For memory safety error detectors that use the upcoming Arm Memory<br>

Tagging Extension (Arm MTE, [2]), however, stack trace collection will<br>

become a significant bottleneck, with respect to CPU and RAM usage,<br>

because stack traces will need to be collected on every malloc() and<br>

free() call. Reducing this overhead will enable collecting stack<br>

traces and improve quality of diagnostics for MTE-enabled dynamic<br>

analysis in production.<br>

<br>

Our goal is to achieve efficient stack trace collection and<br>

reconstruction via offline reverse call graph traversal.  It involves<br>

emitting stack trace hash (instead of the full trace) during runtime;<br>

and reconstructing the full stack trace offline from a stack trace<br>

hash and a call graph when needed.  The proposed LLVM features will<br>

enable this by reconstructing an accurate call graph from the binary.<br>

<br>

<br>

Construction of the conservative call graph<br>

===========================================<br>

<br>

The call graph for direct calls is trivial to obtain.  For indirect or<br>

virtual calls, the conservative assumption will be made: all functions<br>

with matching signature (with relaxations, e.g., see<br>

-fsanitize-cfi-icall-generalize-pointers [3]) will be assumed to be a<br>

possible target.  To further narrow down, a function will not be<br>

considered as a target unless it has external linkage or its address<br>

is taken.  Further narrowing for C++ virtual calls can be done later.<br>

<br>

The possible targets of indirect calls will be split into disjoint<br>

classes of unique function signatures.  In the call graph, an indirect<br>

call will have an edge to the receiver function class with matching<br>

signature to reflect the possible targets for the call.<br>

<br>

The call graph will be saved in a non-code section of the binary,<br>

i.e., .callgraph.  The functions and call sites will be identified<br>

with unique function entry and call site labels in the machine code,<br>

which can be translated into instruction addresses.  The hash of<br>

function signatures will be used to identify and refer to the disjoint<br>

classes of indirect targets.<br>

<br>

To achieve this in LLVM, a new flag will be introduced, which<br>

constructs and emits the call graph in the compiler backend, possibly<br>

implemented in llvm/lib/codeGen/AsmPrinter/AsmPrinter.cpp.<br>

<br>

<br>

Call graph layout in the binary<br>

===============================<br>

<br>

For direct calls, no additional information will be stored in the<br>

binary.  For indirect calls, the call graph will be represented in two<br>

pieces:<br>

<br>

  * a non-code section, .callgraph, mapping the function signature<br>

hashes to the list of target functions entries and indirect/virtual<br>

call sites.  The first word is a version number to account for<br>

potential changes in the format.<br>

<br>

 * labels in .text, indicating the target function entries and<br>

indirect/virtual call sites. The .callgraph section will have pointers<br>

to these labels.  This will not affect the executable code.<br>

<br>

<br>

The layout is as follows:<br>

<br>

<.callgraph section><br>

FormatVersionNumber, TypeHash1,<br>

FuncEntryLabelCount1, .LFunc1Entry, .LFunc2Entry, ..., .LFuncNEntry,<br>

CallSiteLabelCount1,  .LIndirectCallSite1, ..., .LIndirectCallSiteM,<br>

FormatVersionNumber, TypeHash2, …<br>

FuncEntryLabelCount2, ...,<br>

CallSiteLabelCount2,  ...,<br>

...<br>

<br>

<.text section><br>

...<br>

.LFunc1Entry:           ; unique label for target function entry for func1()<br>

    push %rbp<br>

    ...<br>

<br>

.LIndirectCallSite1     ; unique label for indirect function call 1<br>

    callq Y(%rbp)<br>

...<br>

<br>

<br>

Because the linker may concatenate .callgraph sections with different<br>

format version numbers, we repeat the format version number for each<br>

TypeHash list.  This avoids any special logic for the .callgraph<br>

section in the linker.<br>

<br>

<br>

Restoring the call graph<br>

========================<br>

<br>

To extract the call graph from the binary and serialize, a new LLVM<br>

tool, llvm-discg, will be implemented using LLVM disassembler library,<br>

or a new feature will be introduced into llvm-objdump.  The direct<br>

call graph will be reconstructed from the disassembly without needing<br>

the stored call graph data.  The indirect call graph will be<br>

reconstructed from the .callgraph section and the disassembly.<br>

<br>

For the indirect call graph, the labels in the .callgraph section will<br>

be resolved to instruction addresses.  Then, disjoint classes of<br>

function entries and call sites will be recovered based on their type<br>

hashes.  For a call site with type hash H, edges will be made to all<br>

function entries with the same type hash H.  Repeating the procedure<br>

for all call sites, the conservative indirect call graph will be<br>

reconstructed.<br>

<br>

<br>

Future Direction<br>

================<br>

<br>

The construction mode of the call graph can be parametrized to tune<br>

the trade-off between the completeness and soundness of the call graph<br>

based on additional heuristics.<br>

<br>

<br>

References<br>

==========<br>

<br>

[1] <a href="https://clang.llvm.org/docs/AddressSanitizer.html" rel="noreferrer" target="_blank">https://clang.llvm.org/docs/AddressSanitizer.html</a><br>

[2] <a href="https://security.googleblog.com/2019/08/adopting-arm-memory-tagging-extension.html" rel="noreferrer" target="_blank">https://security.googleblog.com/2019/08/adopting-arm-memory-tagging-extension.html</a><br>

[3] <a href="https://clang.llvm.org/docs/ControlFlowIntegrity.html#fsanitize-cfi-icall-generalize-pointers" rel="noreferrer" target="_blank">https://clang.llvm.org/docs/ControlFlowIntegrity.html#fsanitize-cfi-icall-generalize-pointers</a><br>

<br>

<br>

Best,<br>

Necip Fazil Yildiran<br>

</blockquote></div>