[llvm-branch-commits] [clang] [Serialization] Code cleanups and polish 83233 (PR #83237)

Chuanqi Xu via llvm-branch-commits llvm-branch-commits at lists.llvm.org
Tue Oct 22 18:44:25 PDT 2024


ChuanqiXu9 wrote:

> @usx95 may be able to help with the reproducer.
> 
> In the meantime, I'm trying to collect some information on the compile times. So far it looks like we have a ~10-15x compile time regression on some translation units. Without this patch `-ftime-report` shows:
> 
> ```
> ===-------------------------------------------------------------------------===
>                           Clang front-end time report
> ===-------------------------------------------------------------------------===
>   Total Execution Time: 39.1940 seconds (39.7238 wall clock)
> 
>    ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
>   28.2611 ( 77.5%)   1.8439 ( 67.3%)  30.1050 ( 76.8%)  30.5230 ( 76.8%)  Clang front-end timer
>    8.1911 ( 22.5%)   0.8980 ( 32.7%)   9.0891 ( 23.2%)   9.2009 ( 23.2%)  Reading modules
>   36.4522 (100.0%)   2.7419 (100.0%)  39.1940 (100.0%)  39.7238 (100.0%)  Total
> ```
> 
> With it:
> 
> ```
> ===-------------------------------------------------------------------------===
> Clang front-end time report
> ===-------------------------------------------------------------------------===
> Total Execution Time: 466.7373 seconds (1251.6300 wall clock)
> ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
>  404.7200 ( 96.1%)  40.6383 ( 88.8%)  445.3583 ( 95.4%)  471.9647 ( 37.7%)  Clang front-end timer
>  15.2098 (  3.6%)   3.3586 (  7.3%)  18.5684 (  4.0%)  398.1242 ( 31.8%)  Reading modules
>  420.9899 (100.0%)  45.7474 (100.0%)  466.7373 (100.0%)  1251.6300 (100.0%)  Total
> ```
> 
> `perf record -g` / `perf report` give the following picture:
> 
> ```
>   Children      Self  Command  Shared Object       Symbol
> +   94.85%     0.00%  clang    clang               [.] clang::TreeTransform<(anonymous namespace)::TemplateInstantiator>::TransformCallExpr(clang::CallExpr*) [clone .__uniq.16014532493918845222783194145290083557] ◆
> +   93.47%     0.00%  clang    clang               [.] clang::Sema::InstantiateFunctionDefinition(clang::SourceLocation, clang::FunctionDecl*, bool, bool, bool)                                                     ▒
> +   93.37%    83.51%  clang    clang               [.] clang::ASTReader::LoadExternalSpecializations(clang::Decl const*, bool)                                                                                       ▒
> +   93.19%     0.00%  clang    clang               [.] clang::TreeTransform<(anonymous namespace)::TemplateInstantiator>::TransformCompoundStmt(clang::CompoundStmt*, bool) [clone .__uniq.16014532493918845222783194▒
> +   93.08%     0.00%  clang    clang               [.] clang::TreeTransform<(anonymous namespace)::TemplateInstantiator>::TransformUnresolvedLookupExpr(clang::UnresolvedLookupExpr*, bool) [clone .__uniq.1601453249▒
> +   92.98%     0.00%  clang    clang               [.] clang::Sema::BuildTemplateIdExpr(clang::CXXScopeSpec const&, clang::SourceLocation, clang::LookupResult&, bool, clang::TemplateArgumentListInfo const*)       ▒
> +   92.44%     0.00%  clang    clang               [.] clang::Sema::CheckVarTemplateId(clang::VarTemplateDecl*, clang::SourceLocation, clang::SourceLocation, clang::TemplateArgumentListInfo const&)                ▒
> +   92.08%     0.00%  clang    clang               [.] clang::Sema::InstantiateVariableInitializer(clang::VarDecl*, clang::VarDecl*, clang::MultiLevelTemplateArgumentList const&)                                   ▒
> +   91.87%     0.00%  clang    clang               [.] clang::VarTemplateDecl::getPartialSpecializations(llvm::SmallVectorImpl<clang::VarTemplatePartialSpecializationDecl*>&) const                                 ▒
> +   91.18%     0.00%  clang    clang               [.] clang::TreeTransform<(anonymous namespace)::TemplateInstantiator>::TransformBinaryOperator(clang::BinaryOperator*) [clone .__uniq.1601453249391884522278319414▒
> +   91.07%     0.00%  clang    clang               [.] clang::TreeTransform<(anonymous namespace)::TemplateInstantiator>::TransformExprs(clang::Expr* const*, unsigned int, bool, llvm::SmallVectorImpl<clang::Expr*>▒
> +   90.70%     0.01%  clang    clang               [.] clang::Sema::InstantiateVariableDefinition(clang::SourceLocation, clang::VarDecl*, bool, bool, bool)                                                          ▒
> +   90.41%     0.01%  clang    clang               [.] clang::Sema::BuildDeclarationNameExpr(clang::CXXScopeSpec const&, clang::DeclarationNameInfo const&, clang::NamedDecl*, clang::NamedDecl*, clang::TemplateArgu▒
> +   90.29%     0.00%  clang    clang               [.] clang::TreeTransform<(anonymous namespace)::TemplateInstantiator>::TransformInitListExpr(clang::InitListExpr*) [clone .__uniq.16014532493918845222783194145290▒
> +   89.92%     0.00%  clang    clang               [.] clang::TreeTransform<(anonymous namespace)::TemplateInstantiator>::TransformParenExpr(clang::ParenExpr*) [clone .__uniq.16014532493918845222783194145290083557▒
> +   89.23%     0.00%  clang    clang               [.] clang::TreeTransform<(anonymous namespace)::TemplateInstantiator>::TransformConditionalOperator(clang::ConditionalOperator*) [clone .__uniq.160145324939188452▒
> +   84.49%     0.02%  clang    clang               [.] clang::Sema::RequireCompleteTypeImpl(clang::SourceLocation, clang::QualType, clang::Sema::CompleteTypeKind, clang::Sema::TypeDiagnoser*)                      ▒
> +   84.47%     0.00%  clang    clang               [.] clang::Sema::InstantiateClassTemplateSpecialization(clang::SourceLocation, clang::ClassTemplateSpecializationDecl*, clang::TemplateSpecializationKind, bool)  ▒
> +   84.07%     0.01%  clang    clang               [.] clang::Sema::InstantiateClass(clang::SourceLocation, clang::CXXRecordDecl*, clang::CXXRecordDecl*, clang::MultiLevelTemplateArgumentList const&, clang::Templa▒
> +   82.84%     0.02%  clang    clang               [.] clang::TreeTransform<(anonymous namespace)::TemplateInstantiator>::TransformType(clang::TypeLocBuilder&, clang::TypeLoc) [clone .__uniq.1601453249391884522278▒
> +   82.23%     0.02%  clang    clang               [.] clang::TreeTransform<(anonymous namespace)::TemplateInstantiator>::TransformTemplateSpecializationType(clang::TypeLocBuilder&, clang::TemplateSpecializationTy▒
> +   81.99%     0.01%  clang    clang               [.] (anonymous namespace)::TemplateInstantiator::TransformTemplateArgument(clang::TemplateArgumentLoc const&, clang::TemplateArgumentLoc&, bool) [clone .__uniq.16▒
> +   81.54%     0.00%  clang    clang               [.] clang::Sema::RequireCompleteDeclContext(clang::CXXScopeSpec&, clang::DeclContext*)                                                                            ▒
> +   80.18%     0.01%  clang    clang               [.] clang::TreeTransform<(anonymous namespace)::TemplateInstantiator>::TransformType(clang::TypeSourceInfo*) [clone .__uniq.16014532493918845222783194145290083557▒
> +   79.88%     0.12%  clang    clang               [.] clang::Sema::CheckTemplateIdType(clang::TemplateName, clang::SourceLocation, clang::TemplateArgumentListInfo&)                                                ▒
> ```
> 
> I can try to build clang with better debug information and get a higher fidelity profile, but hopefully this already shows the direction to look at.

Thanks. It looks like `ASTReader::LoadExternalSpecializations(const Decl *D, bool OnlyPartial)` is the hot spot. I didn't think about it. Maybe the problem here is `findAll()`? Since we would always load all the specializations. Or the problem is we may call `findAll()` too many times. I'll try to take a look. And a profiling result with more information will be definitely helpful.

https://github.com/llvm/llvm-project/pull/83237


More information about the llvm-branch-commits mailing list