[PATCH] D53379: GSYM symbolication format

Greg Clayton via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Oct 17 09:01:03 PDT 2018


clayborg created this revision.
clayborg added reviewers: markmentovai, zturner.
Herald added subscribers: kristina, jfb, mgrang, JDevlieghere, eraman, aprantl, mgorny.
Herald added a reviewer: JDevlieghere.

I got approval to open source the GSYM symbolication format. This patch still needs to get testing added and switched over to use the AsmPrinter to create the GSYM files, but I wanted to post this patch in progress for the LLVM conference to allow folks to see what it is and try it out. Full details on the file format below:

GSYM Introduction

GSYM is a symbolication file format is designed to be the best format to use for symbolicating addresses into function name + source file + line information. It is a binary file format designed to be mapped into one or more processes. GSYM information can be created by converting DWARF debug information, or Breakpad files. GSYM information can exist as a stand alone file, or be contained in ELF or mach-o files in a section. When embedded into ELF or mach-o files, GSYM sections can share a string tables that already exists within a file.

Why use GSYM?
GSYM files are up to 7x smaller than DWARF files and up to 3x smaller than Breakpad files. The file format is designed to touch as few pages of the file as possible while doing address lookups. GSYM files can be mmap'ed into a process as shared memory allowing multiple processes on a symbolication server to share loaded GSYM pages. The file format includes inline call stack information and can help turn a single address lookup into multiple stack frames that walk the inlined call stack back to the concrete function that invoked these functions.

Converting DWARF Files to GSYM 
`llvm-dsymutil` is available in the `llvm/tools/gsym` directory and has options to convert DWARF into GSYM files. `llvm-dsymutil` has a `-dwarf` option that specifies a DWARF file to convert into a GSYM file. The output file can be specified with the `-out-file` option.

  $ llvm-dsymutil -dwarf /tmp/a.out -out-file /tmp/a.out.gsym

This command will convert a DWARF file into the GSYM file format. This allows clients that are currently symbolicating with DWARF to switch to using the GSYM file format. This tool could be used in a symbolication workflow where symbolication servers convert DWARF to GSYM and cached the results on the fly, or could be used at build time to always produce a GSYM file at build time. DWARF debug information is rich enough to support encoding the inline call stack information for richer and more useful symbolication backtraces.

Converting Breakpad Files to GSYM

`llvm-dsymutil` has a `-breakpad` option that specifies a Breakpad file to convert into a GSYM file. The output file can be specified with the `-out-file` option.

  $ llvm-dsymutil -breakpad /tmp/foo.sym -out-file /tmp/foo.gsym

This allows clients currently using breakpad to switch over to use GSYM files. This tool could be used in a symbolication workflow where symbolication servers convert breakpad to GSYM format on the fly only when needed. Breakpad files do not contain inline call stack information, so it is advisable to use `llvm-dsymutil -dwarf` when possible to avoid losing this vital information.

File Format Overview
The GSYM file consists of a header, address table, address info offset table and address info data for each address.

The GSYM file format when in a stand alone file is ordered as shown:

- Header
- Address Table
- Address Data Offsets Table
- File Table
- String Table
- Address Data

Header

  #define GSYM_MAGIC 0x4753594d
  #define GSYM_VERSION 1
  struct Header {
    uint32_t magic;
    uint16_t version;
    uint8_t  addr_off_size;
    uint8_t  uuid_size;
    uint64_t base_address;
    uint32_t num_addrs;
    uint32_t strtab_offset;
    uint32_t strtab_size;
    uint8_t  uuid[20];
  };

The magic value is set to `GSYM_MAGIC` and allows quick and easy detection of this file format when it is loaded. Addresses in the address table are stored as offsets from a 64 bit address found in `Header.base_address`. This allows the address table to contain 32, 16 or 8 bit offsets, instead of a table of full sized addresses. The file size is smaller and causes fewer pages to be touched during address lookups when the address table is smaller. The size of the address offsets in the address table is specified in the header in `Header.addr_off_size`. The header contains a UUID to ensure the GSYM file can be properly matched to the object ELf or mach-o file that created the stack trace. The header specifies the location of the string table for all strings contained in the GSYM file, or can point to an existing string table within a ELF or mach-o file.

Address Table
The address table immediately follows the header in the file and consists of `Header.num_addrs` address offsets. These offsets are sorted and can be binary searched for efficient lookups. Address offsets are encoded as offsets that are `Header.addr_off_size` bytes in size. During address lookup, the index of the matching address offset will be the index into the address data offsets table.

Address Data Offsets Table
The address data offsets table immediately follows the address table and consists of `Header.num_addrs` 32 bit file offsets: one for each address in the address table. The offsets in this table are the absolute file offset to the address data for each address in the address table. Keeping this data separate from the address table helps to reduce the number of pages that are touched when address lookups occur on a GSYM file.

File Table
The file table immediately follows the address data offsets table. The format of the `FileTable` is:

  struct FileTable {
    uint32_t count;
    FileInfo files[];
  };

The file table starts with a 32 bit count of the number of files that are used in all of the address data, followed by that number of `FileInfo` structures.

Each file in the file table is represented with a `FileInfo` structure:

  struct FileInfo {
    uint32_t directory;
    uint32_t filename;
  };

The FileInfo structure has the file path split into a string for the directory and a string for the filename. The directory and filename are specified as offsets into the string table. Splitting paths into directory and file base name allows GSYM to use the same string table entry for common directories.

String Table
The string table follows the file table in stand alone GSYM files and contains all strings for everything contained in the GSYM file. Any string data should be added to the string table and any references to strings inside GSYM information must be stored as 32 bit string table offsets into this string table.

Address Data
The address data is the payload that contains information about the address that is being looked up. The structure that represents this data is:

  struct AddressInfo {
      uint32_t size;
      uint32_t name;
      AddressData data[];
  };

It starts with a 32 bit size for the address range of the functiopn and is followed by the 32 bit string table offset for the name of the function. The size of the address range is important to encode as it stops address lookups from matching if the address is between two functions in some padding. This is followed by an array of address data information:

  struct AddressData {
      uint32_t type;
      uint32_t length;
      uint8_t data[length];
  };

The address data starts with a 32 bit type, followed by a 32 bit length, followed by an array of bytes that encode each specify kind of data.
The `AddressData.type` is an enumeration value:

  enum class InfoType {
     EndOfList = 0u,
     LineTableInfo = 1u,
     InlineInfo = 2u
  };

The `AddressInfo.data[]` is encoded as a vector of AddressData structs that is terminated by a `AddressData` struct whose type is set to `InfoType.EndOfList`. This allows the GSYM file format the contain arbitrary data for any address range and allows us to expand the GSYM capabilities as we find more uses for it.

`InfoType::EndOfList` is always the last `AddressData` in the `AddressInfo`.

`InfoType::LineTableInfo` is a modified version of the DWARF line tables that efficiently stores line table information for each function. DWARF stores line table information for an entire source file and includes all functions. Having each function's line table encoded separately allows fewer pages to be touched when looking up the line entry for a specific address. The information is optional and can be omitted fo address data that is from a symbol or label where no line table information is available.

`InfoType::InlineInfo` is a format that encodes inline call stacks. This information is optional and doesn't need to be included for each address. If the function has no inlined functions this data should not be included.


Repository:
  rL LLVM

https://reviews.llvm.org/D53379

Files:
  include/llvm/DebugInfo/GSYM/Breakpad.h
  include/llvm/DebugInfo/GSYM/DataRef.h
  include/llvm/DebugInfo/GSYM/DwarfTransformer.h
  include/llvm/DebugInfo/GSYM/FileEntry.h
  include/llvm/DebugInfo/GSYM/FileTableCreator.h
  include/llvm/DebugInfo/GSYM/FileWriter.h
  include/llvm/DebugInfo/GSYM/FunctionInfo.h
  include/llvm/DebugInfo/GSYM/GsymCreator.h
  include/llvm/DebugInfo/GSYM/GsymReader.h
  include/llvm/DebugInfo/GSYM/GsymStreamer.h
  include/llvm/DebugInfo/GSYM/InlineInfo.h
  include/llvm/DebugInfo/GSYM/LineEntry.h
  include/llvm/DebugInfo/GSYM/LineTable.h
  include/llvm/DebugInfo/GSYM/LookupResult.h
  include/llvm/DebugInfo/GSYM/StringTable.h
  include/llvm/DebugInfo/GSYM/StringTableCreator.h
  include/llvm/Support/DataExtractor.h
  lib/DebugInfo/CMakeLists.txt
  lib/DebugInfo/GSYM/Breakpad.cpp
  lib/DebugInfo/GSYM/CMakeLists.txt
  lib/DebugInfo/GSYM/DwarfTransformer.cpp
  lib/DebugInfo/GSYM/FileTableCreator.cpp
  lib/DebugInfo/GSYM/FileWriter.cpp
  lib/DebugInfo/GSYM/FunctionInfo.cpp
  lib/DebugInfo/GSYM/GsymCreator.cpp
  lib/DebugInfo/GSYM/GsymReader.cpp
  lib/DebugInfo/GSYM/GsymStreamer.cpp
  lib/DebugInfo/GSYM/InlineInfo.cpp
  lib/DebugInfo/GSYM/LineTable.cpp
  lib/DebugInfo/GSYM/README.md
  tools/gsym/CMakeLists.txt
  tools/gsym/llvm-gsymutil.cpp
  unittests/DebugInfo/CMakeLists.txt

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D53379.170018.patch
Type: text/x-patch
Size: 158523 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20181017/6060ee7d/attachment-0001.bin>


More information about the llvm-commits mailing list