[all-commits] [llvm/llvm-project] 4df795: [Serialization] Delta-encode consecutive SourceLoc...
Sam McCall via All-commits
all-commits at lists.llvm.org
Thu May 19 00:41:02 PDT 2022
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: 4df795bff75289941508d07bbe9105b93b098105
https://github.com/llvm/llvm-project/commit/4df795bff75289941508d07bbe9105b93b098105
Author: Sam McCall <sam.mccall at gmail.com>
Date: 2022-05-19 (Thu, 19 May 2022)
Changed paths:
M clang/include/clang/Serialization/ASTReader.h
M clang/include/clang/Serialization/ASTRecordReader.h
M clang/include/clang/Serialization/ASTRecordWriter.h
M clang/include/clang/Serialization/ASTWriter.h
A clang/include/clang/Serialization/SourceLocationEncoding.h
M clang/lib/Serialization/ASTReader.cpp
M clang/lib/Serialization/ASTWriter.cpp
M clang/unittests/Serialization/CMakeLists.txt
A clang/unittests/Serialization/SourceLocationEncodingTest.cpp
Log Message:
-----------
[Serialization] Delta-encode consecutive SourceLocations in TypeLoc
Much of the size of PCH/PCM files comes from stored SourceLocations.
These are encoded using (almost) their raw value, VBR-encoded. Absolute
SourceLocations can be relatively large numbers, so this commonly takes
20-30 bits per location.
We can reduce this by exploiting redundancy: many "nearby" SourceLocations are
stored differing only slightly and can be delta-encoded.
Randam-access loading of AST nodes constrains how long these sequences
can be, but we can do it at least within a node that always gets
deserialized as an atomic unit.
TypeLoc is implemented in this patch as it's a relatively small change
that shows most of the API.
This saves ~3.5% of PCH size, I have local changes applying this technique
further that save another 3%, I think it's possible to get to 10% total.
Differential Revision: https://reviews.llvm.org/D125403
More information about the All-commits
mailing list