[PATCH] D109632: [clang] de-duplicate methods from AST files

Volodymyr Sapsai via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Sep 24 20:14:53 PDT 2021


vsapsai added a comment.

In D109632#3021644 <https://reviews.llvm.org/D109632#3021644>, @rmaz wrote:

> @vsapsai just checking on where we are with this one. Is the current plan to investigate an approach only serializing the methods declared in each module, or are we good to go ahead with the set de-duplication approach? I tried profiling with D110123 <https://reviews.llvm.org/D110123> and with the following patch:
>
>   diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp
>   index 279067a8ac8b..aaaca0aff9ab 100644
>   --- a/clang/lib/Serialization/ASTReader.cpp
>   +++ b/clang/lib/Serialization/ASTReader.cpp
>   @@ -8130,7 +8130,7 @@ namespace serialization {
>          FactoryBits = Data.FactoryBits;
>          InstanceHasMoreThanOneDecl = Data.InstanceHasMoreThanOneDecl;
>          FactoryHasMoreThanOneDecl = Data.FactoryHasMoreThanOneDecl;
>   -      return true;
>   +      return false;
>        }
>
> but this was a fair bit slower than the deduplication approach, and for some files would never complete, presumably stuck in an infinite loop of dependent modules. Is there a way to exclude the serialization of methods from dependent modules but make the method lookup more efficient somehow?

That patch looks correct, I was experimenting with exactly the same local change. Have you tried D110123 <https://reviews.llvm.org/D110123> on your original build? In my testing with synthetic test case it looks as good as set deduplication. In its current state D110123 <https://reviews.llvm.org/D110123> isn't really representative for actual code bases because it relies on method lists of specific shape (all non-transitive methods coming after all transitive methods). You can try to add deduplication stats to see that there is less deduplication but it's not entirely eliminated. At least that's my understanding which can be flawed. Though the fact that clang didn't complete for some files is super strange and concerning, have no idea why is it happening and will need to experiment on a real-life project.

I still think it is worth to pursue writing only owned methods in `METHOD_POOL`. Deduplication with a DenseSet works but I still expect it to be more expensive than avoiding duplicates by construction. As for the next step I was thinking about changing `ASTMethodPoolTrait` to skip transitive methods during serialization and it should help with eliminating all the duplicates. With that change we can test and see how it works.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109632/new/

https://reviews.llvm.org/D109632



More information about the cfe-commits mailing list