[all-commits] [llvm/llvm-project] c36506: Make GSYM 64 bit safe and add a new version 2 of t...

Roy Shi via All-commits all-commits at lists.llvm.org
Tue Apr 14 18:56:54 PDT 2026


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: c3650687e0b7317b686b9187d15d2c4c63e05f8b
      https://github.com/llvm/llvm-project/commit/c3650687e0b7317b686b9187d15d2c4c63e05f8b
  Author: Roy Shi <royitaqi at users.noreply.github.com>
  Date:   2026-04-14 (Tue, 14 Apr 2026)

  Changed paths:
    M llvm/include/llvm/DebugInfo/GSYM/CallSiteInfo.h
    M llvm/include/llvm/DebugInfo/GSYM/ExtractRanges.h
    M llvm/include/llvm/DebugInfo/GSYM/FileEntry.h
    M llvm/include/llvm/DebugInfo/GSYM/FileWriter.h
    M llvm/include/llvm/DebugInfo/GSYM/FunctionInfo.h
    A llvm/include/llvm/DebugInfo/GSYM/GlobalData.h
    M llvm/include/llvm/DebugInfo/GSYM/GsymCreator.h
    A llvm/include/llvm/DebugInfo/GSYM/GsymCreatorV1.h
    A llvm/include/llvm/DebugInfo/GSYM/GsymCreatorV2.h
    A llvm/include/llvm/DebugInfo/GSYM/GsymDataExtractor.h
    M llvm/include/llvm/DebugInfo/GSYM/GsymReader.h
    A llvm/include/llvm/DebugInfo/GSYM/GsymReaderV1.h
    A llvm/include/llvm/DebugInfo/GSYM/GsymReaderV2.h
    A llvm/include/llvm/DebugInfo/GSYM/GsymTypes.h
    M llvm/include/llvm/DebugInfo/GSYM/Header.h
    A llvm/include/llvm/DebugInfo/GSYM/HeaderV2.h
    M llvm/include/llvm/DebugInfo/GSYM/InlineInfo.h
    M llvm/include/llvm/DebugInfo/GSYM/LineTable.h
    M llvm/include/llvm/DebugInfo/GSYM/MergedFunctionsInfo.h
    M llvm/include/llvm/DebugInfo/GSYM/StringTable.h
    M llvm/lib/DebugInfo/GSYM/CMakeLists.txt
    M llvm/lib/DebugInfo/GSYM/CallSiteInfo.cpp
    M llvm/lib/DebugInfo/GSYM/DwarfTransformer.cpp
    M llvm/lib/DebugInfo/GSYM/ExtractRanges.cpp
    M llvm/lib/DebugInfo/GSYM/FileWriter.cpp
    M llvm/lib/DebugInfo/GSYM/FunctionInfo.cpp
    A llvm/lib/DebugInfo/GSYM/GlobalData.cpp
    M llvm/lib/DebugInfo/GSYM/GsymCreator.cpp
    A llvm/lib/DebugInfo/GSYM/GsymCreatorV1.cpp
    A llvm/lib/DebugInfo/GSYM/GsymCreatorV2.cpp
    M llvm/lib/DebugInfo/GSYM/GsymReader.cpp
    A llvm/lib/DebugInfo/GSYM/GsymReaderV1.cpp
    A llvm/lib/DebugInfo/GSYM/GsymReaderV2.cpp
    M llvm/lib/DebugInfo/GSYM/Header.cpp
    A llvm/lib/DebugInfo/GSYM/HeaderV2.cpp
    M llvm/lib/DebugInfo/GSYM/InlineInfo.cpp
    M llvm/lib/DebugInfo/GSYM/LineTable.cpp
    M llvm/lib/DebugInfo/GSYM/MergedFunctionsInfo.cpp
    M llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
    M llvm/tools/llvm-gsymutil/Opts.td
    M llvm/tools/llvm-gsymutil/llvm-gsymutil.cpp
    M llvm/unittests/DebugInfo/GSYM/CMakeLists.txt
    M llvm/unittests/DebugInfo/GSYM/GSYMTest.cpp
    A llvm/unittests/DebugInfo/GSYM/GSYMV2Test.cpp
    A llvm/unittests/DebugInfo/GSYM/GsymDataExtractorTest.cpp

  Log Message:
  -----------
  Make GSYM 64 bit safe and add a new version 2 of the GSYM files (#190353)

# Motivation

GSYM files are approaching the need for 64 bit offsets in the GSYM
files. We also want to add more global data to GSYM files. Right now the
GSYM file format is:
```
Header
AddressOffsets
AddressInfoOffsets
FileTable
StringTable
FunctionInfos
```
The location of the `AddressOffsets`, `AddressInfoOffsets` and
`FileTable` are always immediately following the Header. The
`StringTable` is pointed to by the header and the header uses 32 bit
integers for the string table file offset and file size. The
`AddressInfoOffsets` are fixed at 32 bits as well. So with the current
format, we can't have any string or function info with an offset >= 4G.

# GSYM V2 design (64 bit safe and extensible)

This new design increments the GSYM version to 2 and we are adding a new
`GlobalInfoType` enum which allows us to specify the file offset and
file size of all of the things in the global data table to be 64 bit
safe. Everything is now in the global info data (listed below). The new
design is extensible: new global info types can be added in the future,
and the order that they appear in the file can be changed/optimized.

* UUID (optional)
* AddressOffsets table
* AddressInfoOffsets table
* File table
* String table
* FunctionInfo data

We are also adding a new `StringTableEncoding` enum so that new string
table encodings can be added in the future.

GSYM V2 files can be produced by using the new `--oputput-version=2`
option. For example:
```
llvm-gsymutil --convert my.dSYM -o my.gSYM --output-version=2
```


# Tests

**Unit tests**: Extended existing tests (`GSYMTests.cpp`) to cover both
v1 and v2. Added new V2 tests (`GSYMV2Tests.cpp`).
```
ninja DebugInfoGSYMTests SupportTests
unittests/DebugInfo/GSYM/DebugInfoGSYMTests
unittests/Support/SupportTests --gtest_filter='*DataExtractor*'
bin/llvm-lit \
  ../llvm-project/llvm/test/tools/llvm-gsymutil/X86/elf-dwarf.yaml \
  ../llvm-project/llvm/test/tools/llvm-gsymutil/X86/mach-dwarf.yaml
```

**Parity tests to V1 (manual)**:
* All tests were conducted on a very large DSYM (9.24 GB) and the GSYMs
generated from it (3.42~3.57 GB).
* All correctness tests were conducted on both little-endian and
big-endian machines.
* **Data parity (on-par)**: The new gsymutil [generates the exact same
GSYM v1 file as the baseline
gsymutil](https://gist.github.com/royitaqi/746c15ec22725cf89a1f5c6d9fb396aa),
both using a single thread.
* **Performance parity/improvement (on-par; v2 convert is 2.8x
faster)**:
* **Parse + lookup**: The new gsymutil is [2% (GSYM v1 file) and 3%
(GSYM v2 file) slower than the baseline
gsymutil](https://gist.github.com/royitaqi/9789b92d63ae74a806c776f32a33e4fb)
(1.56s vs. 1.52s).
* **Convert**: The new gsymutil is [on par (GSYM v1 file) and 2.8x
faster (GSYM v2 file) than the baseline
gsymutil](https://gist.github.com/royitaqi/17aa69408bc5a3416fae9e192b1dc1ce)
(102s vs. 288s).
* **Memory footprint parity (on-par)**:
* **Convert**: The new gsymutil uses the same amount of memory as the
base line gsymutil (13 GB when converting a 2.52 GB DSYM; 42 GB when
converting a 9.24 GB DSYM) since the memory peak is in `finalize()`
which is before `encode()`.
* **Segment correctness (on-par)**: The new gsymutil [generates
segmented GSYM v2
files](https://gist.github.com/royitaqi/9bea5b3d50f13247d577ea91fbc6368a),
whose content
([seg1](https://gist.github.com/royitaqi/5b95a793b548f2f44edca250d64366b4),
[seg2](https://gist.github.com/royitaqi/f64730cbd1885ac9c0b9136e03e7742e),
[seg3](https://gist.github.com/royitaqi/2ce585fae910c57784cafafe4fb4cdcb))
match that of a [single GSYM v2
file](https://gist.github.com/royitaqi/475499477b9dbb1dea81f5d653ec39bf).



To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications


More information about the All-commits mailing list