[PATCH] D109632: [clang] de-duplicate methods from AST files
Volodymyr Sapsai via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Fri Sep 17 18:04:28 PDT 2021
vsapsai added a comment.
In D109632#3006520 <https://reviews.llvm.org/D109632#3006520>, @rmaz wrote:
> The case we have is more like:
>
> .m -> A -> long list of partially shared deps -> Foundation
> -> B -> long list of partially shared deps -> Foundation
> -> C -> long list of partially shared deps -> Foundation
> .... * a few hundred
>
> So we have a file that imports a lot of modules, in the hundreds. Each of those modules has multiple ObjC interfaces with `-(id)init NS_UNAVAILABLE` and imports Foundation, UIKit and also a large number of libraries that are shared across the top level imports. This will result in A.pcm, B.pcm and C.pcm including hundreds or thousands of init decls that are the same, from system frameworks or whatever modules are shared between the top level imports.
>
> IIUC the code currently serializes the entire ObjCMethodList for a module for every declared method, including the methods that are not part of that module. When deserializing we don't descend into module dependencies as the entire method list would already be deserialized, but that doesn't help for modules that aren't directly dependent. Is this right? If so it seems another approach could be to only serialize the methods declared in that module itself, and during deserialization we would have to load the methods from all dependent modules.
I have created a different test synthesizer synthesize-shared-framework-chain-test.py to reproduce the described framework layout. In `addMethodsToPool` added a set `seen` to count how many methods we can skip and we don't skip anything in the modules but there are a lot of duplicates in .m file compilation. Also I've noticed that the size of .pcm increases the longer chain of modules is reachable from the module (19K for Shared0.pcm vs 38K for Shared49.pcm). Haven't checked if it is because of ObjCMethodList or for another reason.
I don't remember for sure but I don't think there is a consistent policy about a module storing transitive data or only data it owns. I suspect we might be using both approaches and need to check each case separately.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D109632/new/
https://reviews.llvm.org/D109632
More information about the cfe-commits
mailing list