[PATCH] D114095: [clang][lex] Include tracking: simplify and move to preprocessor

Fri Dec 10 09:57:10 PST 2021

jansvoboda11 added a comment.

In D114095#3160103 <https://reviews.llvm.org/D114095#3160103>, @vsapsai wrote:

> I've mentioned it in D112915 <https://reviews.llvm.org/D112915> as we've discussed the stored data format there. But my concern was that bitvector packing might be not the most space-efficient encoding. I haven't done proper testing, just off-the-cuff comparison and it looks like for the most of frameworks in iOS SDK storing included headers per submodule takes less space than encoding them as a bitvector. I have an idea why that might be happening but I haven't checked it in debugger, so'll keep it to myself to avoid derailing the discussion.

Let's bring the conversation over here. I ran the same UIKit test you did and compared the following:

- current trunk
- current trunk with this patch
- current trunk with this patch, with bitvector replaced by vector of IDs (32-bit integers).

The following table shows sizes of .pcm files in bytes and their delta compared to trunk:

  +----------+-----------------+-----------------+
  |   trunk  |    bit vector   |    ID vector    |
  +----------+-----------------+-----------------+
  |   281932 |   281944    +12 |   281988    +56 |
  |   989840 |   989784    -56 |   989968   +128 |
  |   837116 |   837084    -32 |   837212    +96 |
  |   899924 |   899912    -12 |   900004    +80 |
  |   710296 |   710296     +0 |   710376    +80 |
  |   273140 |   273144     +4 |   273196    +56 |
  |  3649856 |  3649024   -832 |  3650804   +948 |
  |   207676 |   207692    +16 |   207740    +64 |
  |   342792 |   342804    +12 |   342860    +68 |
  |  4137660 |  4137460   -200 |  4137940   +280 |
  |   173536 |   173564    +28 |   173580    +44 |
  |   787120 |   787144    +24 |   787180    +60 |
  |  1260652 |  1260596    -56 |  1260804   +152 |
  |   255072 |   255092    +20 |   255128    +56 |
  |   973204 |   973228    +24 |   973268    +64 |
  |   398952 |   398940    -12 |   399036    +84 |
  |   631516 |   631516     +0 |   631588    +72 |
  |  5252932 |  5252348   -584 |  5253612   +680 |
  |   230160 |   230168     +8 |   230228    +68 |
  |    24460 |    24500    +40 |    24500    +40 |
  |    53244 |    53280    +36 |    53288    +44 |
  |    75932 |    75952    +20 |    75972    +40 |
  |    32840 |    32876    +36 |    32884    +44 |
  +----------+-----------------+-----------------+
  | 22479852 | 22478348  -1504 | 22483156  +3304 |
  +----------+-----------------+-----------------+

Used command:

  echo '#import <UIKit/UIKit.h>' | ./bin/clang -fsyntax-only -isysroot "$(xcrun --sdk iphoneos --show-sdk-path)" -target arm64-apple-ios -fmodules -fmodules-cache-path=modules.noindex -x objective-c -

Patch that I applied on top of the one under review to get vector of IDs:

F20979504: bitvector-to-id-vector.diff <https://reviews.llvm.org/F20979504>

I see how the bitvector could explode for large fine-grained modules. They have lots of input files (-> large bitvectors in each submodule), but each submodule only includes a handful files (-> bitvectors are sparse). It seems like this doesn't actually happen, at least in our SDK.

@vsapsai Do you think this warrants more thorough investigation?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114095/new/

https://reviews.llvm.org/D114095