[llvm-dev] [RFC] Optimization remarks: LLVM bitstream format and future plans
Francis Visoiu Mistrih via llvm-dev
llvm-dev at lists.llvm.org
Tue Jun 18 13:43:02 PDT 2019
Hello,
We have been looking into making optimization remarks more scalable.
We looked into a few formats that satisfy the following requirements:
* allows streaming to a file: we want to avoid keeping all the remarks in memory
* allows string deduplication: most of the strings are repeated [1]
* is fast to parse: building clang with remarks results in 24,205,892 remarks
* is compact to save on disk: building clang with YAML remarks results in 17.6GB of remarks
* supports some kind of key-value pairing: we need to have arbitrary remark “arguments” [2]
We took a look at a few formats:
* YAML: 3. & 4. are very far from being reasonable using this format.
* MessagePack [3]: having support for this in LLVM is an advantage for this format. It allowed us to make parsing 5.5x faster and remark files more than 2x smaller.
* clangd’s RIFF-based format [4]. 1. & 5. are not satisfied here.
* .dia: parsing this format (using libclang) is not fast enough for us.
* custom format: we managed to make remarks 11x smaller, and parsing fast enough. The main concern with a custom format is the maintenance and versioning of the format.
* LLVM bitstream:
1. by emitting a block per remark, we can stream to a file
2. by using a string table that is found in the metadata separately we can deduplicate strings
3. llvm-bcanalyzer runs in 20s over all the remark files for clang
4. total size of remarks for clang is 1.3GB -> 13.4x smaller
5. we can have an arbitrary number of records and describe them using abbreviations to provide a key-value-like pairing
We decided to go ahead with LLVM bitstream since it satisfies all our requirements and it is well-known by the community.
The remark generation part of the format is available for review at: https://reviews.llvm.org/D63466.
Another goal is to make it easy to find remarks for a given object file or binary. The way we want to do this on Darwin is to follow the debug info model: add a section to the object file, make the linker ignore it, let dsymutil pack it up and put the final result in the .dSYM bundle.
For that, I plan on making a few more changes:
* Emit the bitstream metadata in the __LLVM,__remarks/.remarks section
* Add the parsing logic to lib/Remarks/RemarksParser and make it usable through the C API
* Add a tool, llvm-remarkutil, to merge the remarks from the object files into a standalone remark file
* Add support do dsymutil to merge and generate a standalone remark file in the .dSYM bundle
* Add support to llvm-remarkutil to convert from YAML to bitstream, to extract metadata from sections, and other utilities
Please let me know what you think!
Thanks,
—
Francis
[1] 2x size reduction with https://reviews.llvm.org/rG7fee2b89fd6e5101bc590e0741f4d7a82b7715e1
[2] Usually, remarks have arbitrary arguments, like the “Args” part of:
```
--- !Missed
Pass: inline
Name: NoDefinition
DebugLoc: { File: 'test-suite/SingleSource/UnitTests/2002-04-17-PrintfChar.c',
Line: 7, Column: 3 }
Function: printArgsNoRet
Args:
- Callee: printf
- String: ' will not be inlined into '
- Caller: printArgsNoRet
DebugLoc: { File: 'test-suite/SingleSource/UnitTests/2002-04-17-PrintfChar.c',
Line: 6, Column: 0 }
- String: ' because its definition is unavailable'
...
```
[3] https://msgpack.org/index.html
[4] https://reviews.llvm.org/rG50f3631057f717448ba34b4175daaa81215fbd5e
More information about the llvm-dev
mailing list