[all-commits] [llvm/llvm-project] 4df795: [Serialization] Delta-encode consecutive SourceLoc...

Sam McCall via All-commits all-commits at lists.llvm.org
Thu May 19 00:41:02 PDT 2022


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 4df795bff75289941508d07bbe9105b93b098105
      https://github.com/llvm/llvm-project/commit/4df795bff75289941508d07bbe9105b93b098105
  Author: Sam McCall <sam.mccall at gmail.com>
  Date:   2022-05-19 (Thu, 19 May 2022)

  Changed paths:
    M clang/include/clang/Serialization/ASTReader.h
    M clang/include/clang/Serialization/ASTRecordReader.h
    M clang/include/clang/Serialization/ASTRecordWriter.h
    M clang/include/clang/Serialization/ASTWriter.h
    A clang/include/clang/Serialization/SourceLocationEncoding.h
    M clang/lib/Serialization/ASTReader.cpp
    M clang/lib/Serialization/ASTWriter.cpp
    M clang/unittests/Serialization/CMakeLists.txt
    A clang/unittests/Serialization/SourceLocationEncodingTest.cpp

  Log Message:
  -----------
  [Serialization] Delta-encode consecutive SourceLocations in TypeLoc

Much of the size of PCH/PCM files comes from stored SourceLocations.
These are encoded using (almost) their raw value, VBR-encoded. Absolute
SourceLocations can be relatively large numbers, so this commonly takes
20-30 bits per location.

We can reduce this by exploiting redundancy: many "nearby" SourceLocations are
stored differing only slightly and can be delta-encoded.
Randam-access loading of AST nodes constrains how long these sequences
can be, but we can do it at least within a node that always gets
deserialized as an atomic unit.

TypeLoc is implemented in this patch as it's a relatively small change
that shows most of the API.
This saves ~3.5% of PCH size, I have local changes applying this technique
further that save another 3%, I think it's possible to get to 10% total.

Differential Revision: https://reviews.llvm.org/D125403




More information about the All-commits mailing list