[clang] [Clang] Improve concept performance 1/N (PR #188421)

Younan Zhang via cfe-commits cfe-commits at lists.llvm.org
Wed Mar 25 00:13:47 PDT 2026


https://github.com/zyn0217 created https://github.com/llvm/llvm-project/pull/188421

The concept parameter mapping patch significantly impacted performance in scenarios where concepts are heavily used, even with the addition of atomic-expression-level caching.

After normalization, we often end up with large atomic expressions containing numerous duplicate and complex template parameter mappings. Previously, we were substituting and checking these repeatedly, which was highly inefficient.

We now cache these substitution results within TemplateInstantiator. This provides us some performance improvement, as in these regression cases:

|Regressions        |          clang-21 |clang-22   | This patch
|--|--|--|--|
usb_ids_gen.cpp       |        1.41s  |  3.90s    |  2.45s
inspector_style_resolver.cpp | 18.21s  |  22.43s   | 19.01s

While performance has not yet so good as clang-21, I think there is still room for future improvements. E.g. We can cache invalid results for SFINAE diagnostics and avoiding redundant pack unpacking, etc.

>From 58b0a6978f604aeab04e0fb027ea3b2c896e38a6 Mon Sep 17 00:00:00 2001
From: Younan Zhang <zyn7109 at gmail.com>
Date: Tue, 24 Mar 2026 19:09:11 +0800
Subject: [PATCH] [Clang] Improve concept performance 1/N

The concept parameter mapping patch significantly impacted performance
in scenarios where concepts are heavily used, even with the addition of
atomic-expression-level caching.

After normalization, we often end up with large atomic expressions
containing numerous duplicate and complex template parameter mappings.
Previously, we were substituting and checking these repeatedly,
which was highly inefficient.

We now cache these substitution results within TemplateInstantiator.
This provides us some performance improvement, as in these regression
cases:

Regressions                  clang-21 clang-22    This patch

usb_ids_gen.cpp               1.41s    3.90s      2.45s
inspector_style_resolver.cpp  18.21s    22.43s    19.01s

While performance has not yet so good as clang-21, I think there is still
room for future improvements. E.g. We can cache invalid results for
SFINAE diagnostics and avoiding redundant pack unpacking, etc.
---
 clang/include/clang/Sema/Sema.h            |  3 +++
 clang/lib/Sema/SemaConcept.cpp             |  7 +++++++
 clang/lib/Sema/SemaTemplateInstantiate.cpp | 23 +++++++++++++++++++++-
 3 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/clang/include/clang/Sema/Sema.h b/clang/include/clang/Sema/Sema.h
index a214a7aa9147b..3f18c97fbc4d4 100644
--- a/clang/include/clang/Sema/Sema.h
+++ b/clang/include/clang/Sema/Sema.h
@@ -15091,6 +15091,9 @@ class Sema final : public SemaBase {
                  UnsubstitutedConstraintSatisfactionCacheResult>
       UnsubstitutedConstraintSatisfactionCache;
 
+  llvm::DenseMap<llvm::FoldingSetNodeID, TemplateArgumentLoc>
+      *CurrentCachedTemplateArgs = nullptr;
+
 private:
   /// Caches pairs of template-like decls whose associated constraints were
   /// checked for subsumption and whether or not the first's constraints did in
diff --git a/clang/lib/Sema/SemaConcept.cpp b/clang/lib/Sema/SemaConcept.cpp
index 9c4f52dd7150c..6ae678fe23700 100644
--- a/clang/lib/Sema/SemaConcept.cpp
+++ b/clang/lib/Sema/SemaConcept.cpp
@@ -487,6 +487,10 @@ class ConstraintSatisfactionChecker {
   // right context.
   ConceptDecl *ParentConcept = nullptr;
 
+public:
+  llvm::DenseMap<llvm::FoldingSetNodeID, TemplateArgumentLoc>
+      CachedTemplateArgs;
+
 private:
   ExprResult
   EvaluateAtomicConstraint(const Expr *AtomicExpr,
@@ -658,6 +662,9 @@ ConstraintSatisfactionChecker::SubstitutionInTemplateArguments(
              ? Constraint.getPackSubstitutionIndex()
              : PackSubstitutionIndex);
 
+  llvm::SaveAndRestore PushTemplateArgsCache(S.CurrentCachedTemplateArgs,
+                                             &CachedTemplateArgs);
+
   if (S.SubstTemplateArgumentsInParameterMapping(
           Constraint.getParameterMapping(), Constraint.getBeginLoc(), MLTAL,
           SubstArgs)) {
diff --git a/clang/lib/Sema/SemaTemplateInstantiate.cpp b/clang/lib/Sema/SemaTemplateInstantiate.cpp
index 34ed5dffa11b4..194d5ef0ba06a 100644
--- a/clang/lib/Sema/SemaTemplateInstantiate.cpp
+++ b/clang/lib/Sema/SemaTemplateInstantiate.cpp
@@ -1329,6 +1329,8 @@ namespace {
     // Whether an incomplete substituion should be treated as an error.
     bool BailOutOnIncomplete;
 
+    std::optional<llvm::FoldingSetNodeID> TemplateArgsHashValue;
+
     // CWG2770: Function parameters should be instantiated when they are
     // needed by a satisfaction check of an atomic constraint or
     // (recursively) by another function parameter.
@@ -1358,7 +1360,12 @@ namespace {
                          SourceLocation Loc,
                          const MultiLevelTemplateArgumentList &TemplateArgs)
         : inherited(SemaRef), TemplateArgs(TemplateArgs), Loc(Loc),
-          BailOutOnIncomplete(false) {}
+          BailOutOnIncomplete(false) {
+      auto &V = TemplateArgsHashValue.emplace();
+      for (auto &Level : TemplateArgs)
+        for (auto &Arg : Level.Args)
+          Arg.Profile(V, SemaRef.Context);
+    }
 
     /// Determine whether the given type \p T has already been
     /// transformed.
@@ -1611,6 +1618,7 @@ namespace {
       }
       return Type;
     }
+
     // Override the default version to handle a rewrite-template-arg-pack case
     // for building a deduction guide.
     bool TransformTemplateArgument(const TemplateArgumentLoc &Input,
@@ -1618,6 +1626,19 @@ namespace {
                                    bool Uneval = false) {
       const TemplateArgument &Arg = Input.getArgument();
       std::vector<TemplateArgument> TArgs;
+      if (auto *Cache = SemaRef.CurrentCachedTemplateArgs;
+          TemplateArgsHashValue && Cache) {
+        llvm::FoldingSetNodeID ID = *TemplateArgsHashValue;
+        Input.getArgument().Profile(ID, SemaRef.Context);
+        if (auto Iter = Cache->find(ID); Iter != Cache->end()) {
+          Output = Iter->second;
+          return false;
+        }
+        bool Ret = inherited::TransformTemplateArgument(Input, Output, Uneval);
+        if (!Ret)
+          Cache->insert({ID, Output});
+        return Ret;
+      }
       switch (Arg.getKind()) {
       case TemplateArgument::Pack:
         assert(SemaRef.CodeSynthesisContexts.empty() ||



More information about the cfe-commits mailing list