[PATCH] D114095: [clang][lex] Include tracking: simplify and move to preprocessor
Jan Svoboda via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Fri Dec 10 09:57:10 PST 2021
jansvoboda11 added a comment.
In D114095#3160103 <https://reviews.llvm.org/D114095#3160103>, @vsapsai wrote:
> I've mentioned it in D112915 <https://reviews.llvm.org/D112915> as we've discussed the stored data format there. But my concern was that bitvector packing might be not the most space-efficient encoding. I haven't done proper testing, just off-the-cuff comparison and it looks like for the most of frameworks in iOS SDK storing included headers per submodule takes less space than encoding them as a bitvector. I have an idea why that might be happening but I haven't checked it in debugger, so'll keep it to myself to avoid derailing the discussion.
Let's bring the conversation over here. I ran the same UIKit test you did and compared the following:
- current trunk
- current trunk with this patch
- current trunk with this patch, with bitvector replaced by vector of IDs (32-bit integers).
The following table shows sizes of .pcm files in bytes and their delta compared to trunk:
+----------+-----------------+-----------------+
| trunk | bit vector | ID vector |
+----------+-----------------+-----------------+
| 281932 | 281944 +12 | 281988 +56 |
| 989840 | 989784 -56 | 989968 +128 |
| 837116 | 837084 -32 | 837212 +96 |
| 899924 | 899912 -12 | 900004 +80 |
| 710296 | 710296 +0 | 710376 +80 |
| 273140 | 273144 +4 | 273196 +56 |
| 3649856 | 3649024 -832 | 3650804 +948 |
| 207676 | 207692 +16 | 207740 +64 |
| 342792 | 342804 +12 | 342860 +68 |
| 4137660 | 4137460 -200 | 4137940 +280 |
| 173536 | 173564 +28 | 173580 +44 |
| 787120 | 787144 +24 | 787180 +60 |
| 1260652 | 1260596 -56 | 1260804 +152 |
| 255072 | 255092 +20 | 255128 +56 |
| 973204 | 973228 +24 | 973268 +64 |
| 398952 | 398940 -12 | 399036 +84 |
| 631516 | 631516 +0 | 631588 +72 |
| 5252932 | 5252348 -584 | 5253612 +680 |
| 230160 | 230168 +8 | 230228 +68 |
| 24460 | 24500 +40 | 24500 +40 |
| 53244 | 53280 +36 | 53288 +44 |
| 75932 | 75952 +20 | 75972 +40 |
| 32840 | 32876 +36 | 32884 +44 |
+----------+-----------------+-----------------+
| 22479852 | 22478348 -1504 | 22483156 +3304 |
+----------+-----------------+-----------------+
Used command:
echo '#import <UIKit/UIKit.h>' | ./bin/clang -fsyntax-only -isysroot "$(xcrun --sdk iphoneos --show-sdk-path)" -target arm64-apple-ios -fmodules -fmodules-cache-path=modules.noindex -x objective-c -
Patch that I applied on top of the one under review to get vector of IDs:
F20979504: bitvector-to-id-vector.diff <https://reviews.llvm.org/F20979504>
I see how the bitvector could explode for large fine-grained modules. They have lots of input files (-> large bitvectors in each submodule), but each submodule only includes a handful files (-> bitvectors are sparse). It seems like this doesn't actually happen, at least in our SDK.
@vsapsai Do you think this warrants more thorough investigation?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D114095/new/
https://reviews.llvm.org/D114095
More information about the cfe-commits
mailing list