[llvm-branch-commits] [clang][CallGraphSection] Add call graph section option and docs (PR #117037)

Paul Kirth via llvm-branch-commits llvm-branch-commits at lists.llvm.org
Wed Dec 4 18:19:57 PST 2024


================
@@ -0,0 +1,251 @@
+==================
+Call Graph Section
+==================
+
+Introduction
+============
+
+With ``-fcall-graph-section``, the compiler will create a call graph section 
+in the object file. It will include type identifiers for indirect calls and 
+targets. This information can be used to map indirect calls to their receivers 
+with matching types. A complete and high-precision call graph can be 
+reconstructed by complementing this information with disassembly 
+(see ``llvm-objdump --call-graph-info``).
+
+Semantics
+=========
+
+A coarse-grained, type-agnostic call graph may allow indirect calls to target
+any function in the program. This approach ensures completeness since no
+indirect call edge is missing. However, it is generally poor in precision
+due to having unneeded edges.
+
+A call graph section provides type identifiers for indirect calls and targets.
+This information can be used to restrict the receivers of an indirect target to
+indirect calls with matching type. Consequently, the precision for indirect
+call edges are improved while maintaining the completeness.
+
+The ``llvm-objdump`` utility provides a ``--call-graph-info`` option to extract
+full call graph information by parsing the content of the call graph section
+and disassembling the program for complementary information, e.g., direct
+calls.
+
+Section layout
+==============
+
+A call graph section consists of zero or more call graph entries.
+Each entry contains information on a function and its indirect calls.
+
+An entry of a call graph section has the following layout in the binary:
+
++---------------------+-----------------------------------------------------------------------+
+| Element             | Content                                                               |
++=====================+=======================================================================+
+| FormatVersionNumber | Format version number.                                                |
++---------------------+-----------------------------------------------------------------------+
+| FunctionEntryPc     | Function entry address.                                               |
++---------------------+-----------------------------------+-----------------------------------+
+|                     | A flag whether the function is an | - 0: not an indirect target       |
+| FunctionKind        | indirect target, and if so,       | - 1: indirect target, unknown id  |
+|                     | whether its type id is known.     | - 2: indirect target, known id    |
++---------------------+-----------------------------------+-----------------------------------+
+| FunctionTypeId      | Type id for the indirect target. Present only when FunctionKind is 2. |
++---------------------+-----------------------------------------------------------------------+
+| CallSiteCount       | Number of type id to indirect call site mappings that follow.         |
++---------------------+-----------------------------------------------------------------------+
+| CallSiteList        | List of type id and indirect call site pc pairs.                      |
++---------------------+-----------------------------------------------------------------------+
+
+Each element in an entry (including each element of the contained lists and
+pairs) occupies 64-bit space.
+
+The format version number is repeated per entry to support concatenation of
+call graph sections with different format versions by the linker.
+
+As of now, the only supported format version is described above and has version
+number 0.
+
+Type identifiers
+================
+
+The type for an indirect call or target is the function signature.
+The mapping from a type to an identifier is an ABI detail.
+In the current experimental implementation, an identifier of type T is
+computed as follows:
+
+  -  Obtain the generalized mangled name for “typeinfo name for T”.
+  -  Compute MD5 hash of the name as a string.
+  -  Reinterpret the first 8 bytes of the hash as a little-endian 64-bit integer.
+
+To avoid mismatched pointer types, generalizations are applied.
+Pointers in return and argument types are treated as equivalent as long as the
+qualifiers for the type they point to match.
+For example, ``char*``, ``char**``, and ``int*`` are considered equivalent
+types. However, ``char*`` and ``const char*`` are considered separate types.
----------------
ilovepi wrote:

Is this the same limitation we use for CFI? I think whatever we do here should match those semantics.

https://github.com/llvm/llvm-project/pull/117037


More information about the llvm-branch-commits mailing list