[clang] Switch builtin strings to use string tables (PR #118734)

via cfe-commits cfe-commits at lists.llvm.org
Wed Dec 4 19:10:02 PST 2024


llvmbot wrote:


<!--LLVM PR SUMMARY COMMENT-->

@llvm/pr-subscribers-backend-sparc

Author: Chandler Carruth (chandlerc)

<details>
<summary>Changes</summary>

The Clang binary (and any binary linking Clang as a library), when built using PIE, ends up with a pretty shocking number of dynamic relocations to apply to the executable image: roughly 400k.

Each of these takes up binary space in the executable, and perhaps most interestingly takes start-up time to apply the relocations.

The largest pattern I identified were the strings used to describe target builtins. The addresses of these string literals were stored into huge arrays, each one requiring a dynamic relocation. The way to avoid this is to design the target builtins to use a single large table of strings and offsets within the table for the individual strings. This switches the builtin management to such a scheme.

This saves over 100k dynamic relocations by my measurement, an over 25% reduction. Just looking at byte size improvements, using the `bloaty` tool to compare a newly built `clang` binary to an old one:

```
    FILE SIZE        VM SIZE
 --------------  --------------
  +1.4%  +653Ki  +1.4%  +653Ki    .rodata
  +0.0%    +960  +0.0%    +960    .text
  +0.0%    +197  +0.0%    +197    .dynstr
  +0.0%    +184  +0.0%    +184    .eh_frame
  +0.0%     +96  +0.0%     +96    .dynsym
  +0.0%     +40  +0.0%     +40    .eh_frame_hdr
  +114%     +32  [ = ]       0    [Unmapped]
  +0.0%     +20  +0.0%     +20    .gnu.hash
  +0.0%      +8  +0.0%      +8    .gnu.version
  +0.9%      +7  +0.9%      +7    [LOAD #<!-- -->2 [R]]
  [ = ]       0 -75.4% -3.00Ki    .relro_padding
 -16.1%  -802Ki -16.1%  -802Ki    .data.rel.ro
 -27.3% -2.52Mi -27.3% -2.52Mi    .rela.dyn
  -1.6% -2.66Mi  -1.6% -2.66Mi    TOTAL
```

We get a 16% reduction in the `.data.rel.ro` section, and nearly 30% reduction in `.rela.dyn` where those reloctaions are stored.

This is also visible in my benchmarking of binary start-up overhead at least:

```
Benchmark 1: ./old_clang --version
  Time (mean ± σ):      17.6 ms ±   1.5 ms    [User: 4.1 ms, System: 13.3 ms]
  Range (min … max):    14.2 ms …  22.8 ms    162 runs

Benchmark 2: ./new_clang --version
  Time (mean ± σ):      15.5 ms ±   1.4 ms    [User: 3.6 ms, System: 11.8 ms]
  Range (min … max):    12.4 ms …  20.3 ms    216 runs

Summary
  './new_clang --version' ran
    1.13 ± 0.14 times faster than './old_clang --version'
```

We get about 2ms faster `--version` runs. While there is a lot of noise in binary execution time, this delta is pretty consistent, and represents over 10% improvement. This is particularly interesting to me because for very short source files, repeatedly starting the `clang` binary is actually the dominant cost. For example, `configure` scripts running against the `clang` compiler are slow in large part because of binary start up time, not the time to process the actual inputs to the compiler.

----

This PR implements the string tables using `constexpr` code and the existing macro system. I understand that the builtins are moving towards a TableGen model, and if complete that would provide more options for modeling this. Unfortunately, that migration isn't complete, and even the parts that are migrated still rely on the ability to break out of the TableGen model and directly expand an X-macro style `BUILTIN(...)` textually. I looked at trying to complete the move to TableGen, but it would both require the difficult migration of the remaining targets, and solving some tricky problems with how to move away from any macro-based expansion.

I was also able to find a reasonably clean and effective way of doing this with the existing macros and some `constexpr` code that I think is clean enough to be a pretty good intermediate state, and maybe give a good target for the eventual TableGen solution. I was also able to factor the macros into set of consistent patterns that avoids a significant regression in overall boilerplate.

There is one challenge with this approach: it requires the host compiler to support (very) long string literals, a bit over half a meg. =/ The current minimum MSVC version rejects these, but the very next patch release (16.8) removed that restriction. I'm going to send out a separate PR / RFC to raise the minimum version by one patch release, which I hope is acceptable as the current version was set years ago.

FWIW, there are a few more low-hanging fruit sources of excessive dynamic relocations, maybe as many as 50k to 100k more that I'll take a look at to see if I can identify easy fixes. Beyond that, it seems to get quite difficult. It might be worth adding some guidance to developer documentation to try to avoid creating global data structures that _repeatedly_ store pointers to other globals.

---

Patch is 77.35 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/118734.diff


48 Files Affected:

- (modified) clang/include/clang/Basic/Builtins.h (+174-36) 
- (modified) clang/include/clang/Basic/BuiltinsPPC.def (+1) 
- (modified) clang/include/clang/Basic/TargetInfo.h (+5-1) 
- (modified) clang/lib/Basic/Builtins.cpp (+81-39) 
- (modified) clang/lib/Basic/Targets/AArch64.cpp (+32-28) 
- (modified) clang/lib/Basic/Targets/AArch64.h (+2-1) 
- (modified) clang/lib/Basic/Targets/AMDGPU.cpp (+14-9) 
- (modified) clang/lib/Basic/Targets/AMDGPU.h (+2-1) 
- (modified) clang/lib/Basic/Targets/ARC.h (+4) 
- (modified) clang/lib/Basic/Targets/ARM.cpp (+24-21) 
- (modified) clang/lib/Basic/Targets/ARM.h (+2-1) 
- (modified) clang/lib/Basic/Targets/AVR.h (+4-1) 
- (modified) clang/lib/Basic/Targets/BPF.cpp (+12-7) 
- (modified) clang/lib/Basic/Targets/BPF.h (+2-1) 
- (modified) clang/lib/Basic/Targets/CSKY.cpp (-4) 
- (modified) clang/lib/Basic/Targets/CSKY.h (+4-1) 
- (modified) clang/lib/Basic/Targets/DirectX.h (+4-1) 
- (modified) clang/lib/Basic/Targets/Hexagon.cpp (+15-11) 
- (modified) clang/lib/Basic/Targets/Hexagon.h (+2-1) 
- (modified) clang/lib/Basic/Targets/Lanai.h (+4-1) 
- (modified) clang/lib/Basic/Targets/LoongArch.cpp (+14-9) 
- (modified) clang/lib/Basic/Targets/LoongArch.h (+2-1) 
- (modified) clang/lib/Basic/Targets/M68k.cpp (+3-2) 
- (modified) clang/lib/Basic/Targets/M68k.h (+2-1) 
- (modified) clang/lib/Basic/Targets/MSP430.h (+3-2) 
- (modified) clang/lib/Basic/Targets/Mips.cpp (+13-9) 
- (modified) clang/lib/Basic/Targets/Mips.h (+2-1) 
- (modified) clang/lib/Basic/Targets/NVPTX.cpp (+15-11) 
- (modified) clang/lib/Basic/Targets/NVPTX.h (+2-1) 
- (modified) clang/lib/Basic/Targets/PNaCl.h (+4-1) 
- (modified) clang/lib/Basic/Targets/PPC.cpp (+15-11) 
- (modified) clang/lib/Basic/Targets/PPC.h (+2-1) 
- (modified) clang/lib/Basic/Targets/RISCV.cpp (+19-13) 
- (modified) clang/lib/Basic/Targets/RISCV.h (+2-1) 
- (modified) clang/lib/Basic/Targets/SPIR.cpp (+3-2) 
- (modified) clang/lib/Basic/Targets/SPIR.h (+6-2) 
- (modified) clang/lib/Basic/Targets/Sparc.h (+3-2) 
- (modified) clang/lib/Basic/Targets/SystemZ.cpp (+14-9) 
- (modified) clang/lib/Basic/Targets/SystemZ.h (+2-1) 
- (modified) clang/lib/Basic/Targets/TCE.h (+4-1) 
- (modified) clang/lib/Basic/Targets/VE.cpp (+12-7) 
- (modified) clang/lib/Basic/Targets/VE.h (+2-1) 
- (modified) clang/lib/Basic/Targets/WebAssembly.cpp (+15-11) 
- (modified) clang/lib/Basic/Targets/WebAssembly.h (+2-1) 
- (modified) clang/lib/Basic/Targets/X86.cpp (+42-26) 
- (modified) clang/lib/Basic/Targets/X86.h (+4-2) 
- (modified) clang/lib/Basic/Targets/XCore.cpp (+13-9) 
- (modified) clang/lib/Basic/Targets/XCore.h (+2-1) 


``````````diff
diff --git a/clang/include/clang/Basic/Builtins.h b/clang/include/clang/Basic/Builtins.h
index 89f65682ae5b41..47729456380c43 100644
--- a/clang/include/clang/Basic/Builtins.h
+++ b/clang/include/clang/Basic/Builtins.h
@@ -55,6 +55,7 @@ struct HeaderDesc {
 #undef HEADER
   } ID;
 
+  constexpr HeaderDesc() : ID() {}
   constexpr HeaderDesc(HeaderID ID) : ID(ID) {}
 
   const char *getName() const;
@@ -68,14 +69,144 @@ enum ID {
   FirstTSBuiltin
 };
 
+// The info used to represent each builtin.
 struct Info {
-  llvm::StringLiteral Name;
-  const char *Type, *Attributes;
-  const char *Features;
+  // Rather than store pointers to the string literals describing these four
+  // aspects of builtins, we store offsets into a common string table.
+  struct StrOffsets {
+    int Name;
+    int Type;
+    int Attributes;
+    int Features;
+  } Offsets;
+
   HeaderDesc Header;
   LanguageID Langs;
 };
 
+// The storage for `N` builtins. This contains a single pointer to the string
+// table used for these builtins and an array of metadata for each builtin.
+template <size_t N> struct Storage {
+  const char *StringTable;
+
+  std::array<Info, N> Infos;
+
+  // A constexpr function to construct the storage for a a given string table in
+  // the first argument and an array in the second argument. This is *only*
+  // expected to be used at compile time, we should mark it `consteval` when
+  // available.
+  //
+  // The `Infos` array is particularly special. This function expects an array
+  // of `Info` structs, where the string offsets of each entry refer to the
+  // *sizes* of those strings rather than their offsets, and for the target
+  // string to be in the provided string table at an offset the sum of all
+  // previous string sizes. This function walks the `Infos` array computing the
+  // running sum and replacing the sizes with the actual offsets in the string
+  // table that should be used. This arrangement is designed to make it easy to
+  // expand `.def` and `.inc` files with X-macros to construct both the string
+  // table and the `Info` structs in the arguments to this function.
+  static constexpr auto Make(const char *Strings,
+                             std::array<Info, N> Infos) -> Storage<N> {
+    // Translate lengths to offsets.
+    int Offset = 0;
+    for (auto &I : Infos) {
+      Info::StrOffsets NewOffsets = {};
+      NewOffsets.Name = Offset;
+      Offset += I.Offsets.Name;
+      NewOffsets.Type = Offset;
+      Offset += I.Offsets.Type;
+      NewOffsets.Attributes = Offset;
+      Offset += I.Offsets.Attributes;
+      NewOffsets.Features = Offset;
+      Offset += I.Offsets.Features;
+      I.Offsets = NewOffsets;
+    }
+    return {Strings, Infos};
+  }
+};
+
+// A detail macro used below to emit a string literal that, after string literal
+// concatenation, ends up triggering the `-Woverlength-strings` warning. While
+// the warning is useful in general to catch accidentally excessive strings,
+// here we are creating them intentionally.
+//
+// This relies on a subtle aspect of `_Pragma`: that the *diagnostic* ones don't
+// turn into actual tokens that would disrupt string literal concatenation.
+#ifdef __clang__
+#define CLANG_BUILTIN_DETAIL_STR_TABLE(S)                                      \
+  _Pragma("clang diagnostic push")                                             \
+      _Pragma("clang diagnostic ignored \"-Woverlength-strings\"")             \
+          S _Pragma("clang diagnostic pop")
+#else
+#define CLANG_BUILTIN_DETAIL_STR_TABLE(S) S
+#endif
+
+// A macro that can be used with `Builtins.def` and similar files as an X-macro
+// to add the string arguments to a builtin string table. This is typically the
+// target for the `BUILTIN`, `LANGBUILTIN`, or `LIBBUILTIN` macros in those
+// files.
+#define CLANG_BUILTIN_STR_TABLE(ID, TYPE, ATTRS)                               \
+  CLANG_BUILTIN_DETAIL_STR_TABLE(#ID "\0" TYPE "\0" ATTRS "\0" /*FEATURE*/ "\0")
+
+// A macro that can be used with target builtin `.def` and `.inc` files as an
+// X-macro to add the string arguments to a builtin string table. this is
+// typically the target for the `TARGET_BUILTIN` macro.
+#define CLANG_TARGET_BUILTIN_STR_TABLE(ID, TYPE, ATTRS, FEATURE)               \
+  CLANG_BUILTIN_DETAIL_STR_TABLE(#ID "\0" TYPE "\0" ATTRS "\0" FEATURE "\0")
+
+// A macro that can be used with target builtin `.def` and `.inc` files as an
+// X-macro to add the string arguments to a builtin string table. this is
+// typically the target for the `TARGET_HEADER_BUILTIN` macro. We can't delegate
+// to `TARGET_BUILTIN` because the `FEATURE` string changes position.
+#define CLANG_TARGET_HEADER_BUILTIN_STR_TABLE(ID, TYPE, ATTRS, HEADER, LANGS,  \
+                                              FEATURE)                         \
+  CLANG_BUILTIN_DETAIL_STR_TABLE(#ID "\0" TYPE "\0" ATTRS "\0" FEATURE "\0")
+
+// A detail macro used internally to compute the desired string table
+// `StrOffsets` struct for arguments to `Storage::Make`.
+#define CLANG_BUILTIN_DETAIL_STR_OFFSETS(ID, TYPE, ATTRS)                      \
+  Builtin::Info::StrOffsets {                                                  \
+    llvm::StringLiteral(#ID).size() + 1, llvm::StringLiteral(TYPE).size() + 1, \
+        llvm::StringLiteral(ATTRS).size() + 1,                                 \
+        llvm::StringLiteral("").size() + 1                                     \
+  }
+
+// A detail macro used internally to compute the desired string table
+// `StrOffsets` struct for arguments to `Storage::Make`.
+#define CLANG_TARGET_BUILTIN_DETAIL_STR_OFFSETS(ID, TYPE, ATTRS, FEATURE)      \
+  Builtin::Info::StrOffsets {                                                  \
+    llvm::StringLiteral(#ID).size() + 1, llvm::StringLiteral(TYPE).size() + 1, \
+        llvm::StringLiteral(ATTRS).size() + 1,                                 \
+        llvm::StringLiteral(FEATURE).size() + 1                                \
+  }
+
+// A set of macros that can be used with builtin `.def' files as an X-macro to
+// create an `Info` struct for a particular builtin. It both computes the
+// `StrOffsets` value for the string table (the lengths here, translated to
+// offsets by the Storage::Make function), and the other metadata for each
+// builtin.
+//
+// There is a corresponding macro for each of `BUILTIN`, `LANGBUILTIN`,
+// `LIBBUILTIN`, `TARGET_BUILTIN`, and `TARGET_HEADER_BUILTIN`.
+#define CLANG_BUILTIN_ENTRY(ID, TYPE, ATTRS)                                   \
+  Builtin::Info{CLANG_BUILTIN_DETAIL_STR_OFFSETS(ID, TYPE, ATTRS),             \
+                HeaderDesc::NO_HEADER, ALL_LANGUAGES},
+#define CLANG_LANGBUILTIN_ENTRY(ID, TYPE, ATTRS, LANG)                         \
+  Builtin::Info{CLANG_BUILTIN_DETAIL_STR_OFFSETS(ID, TYPE, ATTRS),             \
+                HeaderDesc::NO_HEADER, LANG},
+#define CLANG_LIBBUILTIN_ENTRY(ID, TYPE, ATTRS, HEADER, LANG)                  \
+  Builtin::Info{CLANG_BUILTIN_DETAIL_STR_OFFSETS(ID, TYPE, ATTRS),             \
+                HeaderDesc::HEADER, LANG},
+#define CLANG_TARGET_BUILTIN_ENTRY(ID, TYPE, ATTRS, FEATURE)                   \
+  Builtin::Info{                                                               \
+      CLANG_TARGET_BUILTIN_DETAIL_STR_OFFSETS(ID, TYPE, ATTRS, FEATURE),       \
+      HeaderDesc::NO_HEADER, ALL_LANGUAGES},
+#define CLANG_TARGET_HEADER_BUILTIN_ENTRY(ID, TYPE, ATTRS, HEADER, LANG,       \
+                                          FEATURE)                             \
+  Builtin::Info{                                                               \
+      CLANG_TARGET_BUILTIN_DETAIL_STR_OFFSETS(ID, TYPE, ATTRS, FEATURE),       \
+      HeaderDesc::HEADER, LANG},
+
 /// Holds information about both target-independent and
 /// target-specific builtins, allowing easy queries by clients.
 ///
@@ -83,8 +214,11 @@ struct Info {
 /// AuxTSRecords. Their IDs are shifted up by TSRecords.size() and need to
 /// be translated back with getAuxBuiltinID() before use.
 class Context {
-  llvm::ArrayRef<Info> TSRecords;
-  llvm::ArrayRef<Info> AuxTSRecords;
+  const char *TSStrTable = nullptr;
+  const char *AuxTSStrTable = nullptr;
+
+  llvm::ArrayRef<Info> TSInfos;
+  llvm::ArrayRef<Info> AuxTSInfos;
 
 public:
   Context() = default;
@@ -100,12 +234,13 @@ class Context {
 
   /// Return the identifier name for the specified builtin,
   /// e.g. "__builtin_abs".
-  llvm::StringRef getName(unsigned ID) const { return getRecord(ID).Name; }
+  llvm::StringRef getName(unsigned ID) const;
 
   /// Get the type descriptor string for the specified builtin.
-  const char *getTypeString(unsigned ID) const {
-    return getRecord(ID).Type;
-  }
+  const char *getTypeString(unsigned ID) const;
+
+  /// Get the attributes descriptor string for the specified builtin.
+  const char *getAttributesString(unsigned ID) const;
 
   /// Return true if this function is a target-specific builtin.
   bool isTSBuiltin(unsigned ID) const {
@@ -114,40 +249,40 @@ class Context {
 
   /// Return true if this function has no side effects.
   bool isPure(unsigned ID) const {
-    return strchr(getRecord(ID).Attributes, 'U') != nullptr;
+    return strchr(getAttributesString(ID), 'U') != nullptr;
   }
 
   /// Return true if this function has no side effects and doesn't
   /// read memory.
   bool isConst(unsigned ID) const {
-    return strchr(getRecord(ID).Attributes, 'c') != nullptr;
+    return strchr(getAttributesString(ID), 'c') != nullptr;
   }
 
   /// Return true if we know this builtin never throws an exception.
   bool isNoThrow(unsigned ID) const {
-    return strchr(getRecord(ID).Attributes, 'n') != nullptr;
+    return strchr(getAttributesString(ID), 'n') != nullptr;
   }
 
   /// Return true if we know this builtin never returns.
   bool isNoReturn(unsigned ID) const {
-    return strchr(getRecord(ID).Attributes, 'r') != nullptr;
+    return strchr(getAttributesString(ID), 'r') != nullptr;
   }
 
   /// Return true if we know this builtin can return twice.
   bool isReturnsTwice(unsigned ID) const {
-    return strchr(getRecord(ID).Attributes, 'j') != nullptr;
+    return strchr(getAttributesString(ID), 'j') != nullptr;
   }
 
   /// Returns true if this builtin does not perform the side-effects
   /// of its arguments.
   bool isUnevaluated(unsigned ID) const {
-    return strchr(getRecord(ID).Attributes, 'u') != nullptr;
+    return strchr(getAttributesString(ID), 'u') != nullptr;
   }
 
   /// Return true if this is a builtin for a libc/libm function,
   /// with a "__builtin_" prefix (e.g. __builtin_abs).
   bool isLibFunction(unsigned ID) const {
-    return strchr(getRecord(ID).Attributes, 'F') != nullptr;
+    return strchr(getAttributesString(ID), 'F') != nullptr;
   }
 
   /// Determines whether this builtin is a predefined libc/libm
@@ -158,21 +293,21 @@ class Context {
   /// they do not, but they are recognized as builtins once we see
   /// a declaration.
   bool isPredefinedLibFunction(unsigned ID) const {
-    return strchr(getRecord(ID).Attributes, 'f') != nullptr;
+    return strchr(getAttributesString(ID), 'f') != nullptr;
   }
 
   /// Returns true if this builtin requires appropriate header in other
   /// compilers. In Clang it will work even without including it, but we can emit
   /// a warning about missing header.
   bool isHeaderDependentFunction(unsigned ID) const {
-    return strchr(getRecord(ID).Attributes, 'h') != nullptr;
+    return strchr(getAttributesString(ID), 'h') != nullptr;
   }
 
   /// Determines whether this builtin is a predefined compiler-rt/libgcc
   /// function, such as "__clear_cache", where we know the signature a
   /// priori.
   bool isPredefinedRuntimeFunction(unsigned ID) const {
-    return strchr(getRecord(ID).Attributes, 'i') != nullptr;
+    return strchr(getAttributesString(ID), 'i') != nullptr;
   }
 
   /// Determines whether this builtin is a C++ standard library function
@@ -180,7 +315,7 @@ class Context {
   /// specialization, where the signature is determined by the standard library
   /// declaration.
   bool isInStdNamespace(unsigned ID) const {
-    return strchr(getRecord(ID).Attributes, 'z') != nullptr;
+    return strchr(getAttributesString(ID), 'z') != nullptr;
   }
 
   /// Determines whether this builtin can have its address taken with no
@@ -194,33 +329,33 @@ class Context {
 
   /// Determines whether this builtin has custom typechecking.
   bool hasCustomTypechecking(unsigned ID) const {
-    return strchr(getRecord(ID).Attributes, 't') != nullptr;
+    return strchr(getAttributesString(ID), 't') != nullptr;
   }
 
   /// Determines whether a declaration of this builtin should be recognized
   /// even if the type doesn't match the specified signature.
   bool allowTypeMismatch(unsigned ID) const {
-    return strchr(getRecord(ID).Attributes, 'T') != nullptr ||
+    return strchr(getAttributesString(ID), 'T') != nullptr ||
            hasCustomTypechecking(ID);
   }
 
   /// Determines whether this builtin has a result or any arguments which
   /// are pointer types.
   bool hasPtrArgsOrResult(unsigned ID) const {
-    return strchr(getRecord(ID).Type, '*') != nullptr;
+    return strchr(getTypeString(ID), '*') != nullptr;
   }
 
   /// Return true if this builtin has a result or any arguments which are
   /// reference types.
   bool hasReferenceArgsOrResult(unsigned ID) const {
-    return strchr(getRecord(ID).Type, '&') != nullptr ||
-           strchr(getRecord(ID).Type, 'A') != nullptr;
+    return strchr(getTypeString(ID), '&') != nullptr ||
+           strchr(getTypeString(ID), 'A') != nullptr;
   }
 
   /// If this is a library function that comes from a specific
   /// header, retrieve that header name.
   const char *getHeaderName(unsigned ID) const {
-    return getRecord(ID).Header.getName();
+    return getInfo(ID).Header.getName();
   }
 
   /// Determine whether this builtin is like printf in its
@@ -245,27 +380,25 @@ class Context {
   /// Such functions can be const when the MathErrno lang option and FP
   /// exceptions are disabled.
   bool isConstWithoutErrnoAndExceptions(unsigned ID) const {
-    return strchr(getRecord(ID).Attributes, 'e') != nullptr;
+    return strchr(getAttributesString(ID), 'e') != nullptr;
   }
 
   bool isConstWithoutExceptions(unsigned ID) const {
-    return strchr(getRecord(ID).Attributes, 'g') != nullptr;
+    return strchr(getAttributesString(ID), 'g') != nullptr;
   }
 
-  const char *getRequiredFeatures(unsigned ID) const {
-    return getRecord(ID).Features;
-  }
+  const char *getRequiredFeatures(unsigned ID) const;
 
   unsigned getRequiredVectorWidth(unsigned ID) const;
 
   /// Return true if builtin ID belongs to AuxTarget.
   bool isAuxBuiltinID(unsigned ID) const {
-    return ID >= (Builtin::FirstTSBuiltin + TSRecords.size());
+    return ID >= (Builtin::FirstTSBuiltin + TSInfos.size());
   }
 
   /// Return real builtin ID (i.e. ID it would have during compilation
   /// for AuxTarget).
-  unsigned getAuxBuiltinID(unsigned ID) const { return ID - TSRecords.size(); }
+  unsigned getAuxBuiltinID(unsigned ID) const { return ID - TSInfos.size(); }
 
   /// Returns true if this is a libc/libm function without the '__builtin_'
   /// prefix.
@@ -277,16 +410,21 @@ class Context {
 
   /// Return true if this function can be constant evaluated by Clang frontend.
   bool isConstantEvaluated(unsigned ID) const {
-    return strchr(getRecord(ID).Attributes, 'E') != nullptr;
+    return strchr(getAttributesString(ID), 'E') != nullptr;
   }
 
   /// Returns true if this is an immediate (consteval) function
   bool isImmediate(unsigned ID) const {
-    return strchr(getRecord(ID).Attributes, 'G') != nullptr;
+    return strchr(getAttributesString(ID), 'G') != nullptr;
   }
 
 private:
-  const Info &getRecord(unsigned ID) const;
+  auto getStrTableAndInfo(unsigned ID) const
+      -> std::pair<const char *, const Info &>;
+
+  const Info &getInfo(unsigned ID) const {
+    return getStrTableAndInfo(ID).second;
+  }
 
   /// Helper function for isPrintfLike and isScanfLike.
   bool isLike(unsigned ID, unsigned &FormatIdx, bool &HasVAListArg,
diff --git a/clang/include/clang/Basic/BuiltinsPPC.def b/clang/include/clang/Basic/BuiltinsPPC.def
index 161df386f00f03..bb7d54bbb793eb 100644
--- a/clang/include/clang/Basic/BuiltinsPPC.def
+++ b/clang/include/clang/Basic/BuiltinsPPC.def
@@ -1138,5 +1138,6 @@ UNALIASED_CUSTOM_BUILTIN(mma_pmxvbf16ger2nn, "vW512*VVi15i15i3", true,
 // FIXME: Obviously incomplete.
 
 #undef BUILTIN
+#undef TARGET_BUILTIN
 #undef CUSTOM_BUILTIN
 #undef UNALIASED_CUSTOM_BUILTIN
diff --git a/clang/include/clang/Basic/TargetInfo.h b/clang/include/clang/Basic/TargetInfo.h
index 4420228793e95f..44fc0a08735f14 100644
--- a/clang/include/clang/Basic/TargetInfo.h
+++ b/clang/include/clang/Basic/TargetInfo.h
@@ -16,6 +16,7 @@
 
 #include "clang/Basic/AddressSpaces.h"
 #include "clang/Basic/BitmaskEnum.h"
+#include "clang/Basic/Builtins.h"
 #include "clang/Basic/CFProtectionOptions.h"
 #include "clang/Basic/CodeGenOptions.h"
 #include "clang/Basic/LLVM.h"
@@ -1013,7 +1014,10 @@ class TargetInfo : public TransferrableTargetInfo,
   /// Return information about target-specific builtins for
   /// the current primary target, and info about which builtins are non-portable
   /// across the current set of primary and secondary targets.
-  virtual ArrayRef<Builtin::Info> getTargetBuiltins() const = 0;
+  virtual ArrayRef<Builtin::Info> getTargetBuiltins() const { return {}; };
+
+  virtual auto getTargetBuiltinStorage() const
+      -> std::pair<const char *, ArrayRef<Builtin::Info>> = 0;
 
   /// Returns target-specific min and max values VScale_Range.
   virtual std::optional<std::pair<unsigned, unsigned>>
diff --git a/clang/lib/Basic/Builtins.cpp b/clang/lib/Basic/Builtins.cpp
index 25a601573698e7..d1137cb6b6f13a 100644
--- a/clang/lib/Basic/Builtins.cpp
+++ b/clang/lib/Basic/Builtins.cpp
@@ -29,54 +29,93 @@ const char *HeaderDesc::getName() const {
   llvm_unreachable("Unknown HeaderDesc::HeaderID enum");
 }
 
-static constexpr Builtin::Info BuiltinInfo[] = {
-    {"not a builtin function", nullptr, nullptr, nullptr, HeaderDesc::NO_HEADER,
-     ALL_LANGUAGES},
-#define BUILTIN(ID, TYPE, ATTRS)                                               \
-  {#ID, TYPE, ATTRS, nullptr, HeaderDesc::NO_HEADER, ALL_LANGUAGES},
-#define LANGBUILTIN(ID, TYPE, ATTRS, LANGS)                                    \
-  {#ID, TYPE, ATTRS, nullptr, HeaderDesc::NO_HEADER, LANGS},
-#define LIBBUILTIN(ID, TYPE, ATTRS, HEADER, LANGS)                             \
-  {#ID, TYPE, ATTRS, nullptr, HeaderDesc::HEADER, LANGS},
+static constexpr auto BuiltinStorage =
+    Builtin::Storage<Builtin::FirstTSBuiltin>::Make(
+        CLANG_BUILTIN_STR_TABLE("not a builtin function", "", "")
+#define BUILTIN CLANG_BUILTIN_STR_TABLE
 #include "clang/Basic/Builtins.inc"
-};
+            ,
+        {CLANG_BUILTIN_ENTRY("not a builtin function", "", "")
+#define BUILTIN CLANG_BUILTIN_ENTRY
+#define LANGBUILTIN CLANG_LANGBUILTIN_ENTRY
+#define LIBBUILTIN CLANG_LIBBUILTIN_ENTRY
+#include "clang/Basic/Builtins.inc"
+        });
 
-const Builtin::Info &Builtin::Context::getRecord(unsigned ID) const {
+auto Builtin::Context::getStrTableAndInfo(unsigned ID) const
+    -> std::pair<const char *, const Info &> {
   if (ID < Builtin::FirstTSBuiltin)
-    return BuiltinInfo[ID];
-  assert(((ID - Builtin::FirstTSBuiltin) <
-          (TSRecords.size() + AuxTSRecords.size())) &&
-         "Invalid builtin ID!");
+    return {BuiltinStorage.StringTable, BuiltinStorage.Infos[ID]};
+  assert(
+      ((ID - Builtin::FirstTSBuiltin) < (TSInfos.size() + AuxTSInfos.size())) &&
+      "Invalid builtin ID!");
   if (isAuxBuiltinID(ID))
-    return AuxTSRecords[getAuxBuiltinID(ID) - Builtin::FirstTSBuiltin];
-  return TSRecords[ID - Builtin::FirstTSBuiltin];
+    return {AuxTSStrTable,
+            AuxTSInfos[getAuxBuiltinID(ID) - Builtin::FirstTSBuiltin]};
+  return {TSStrTable, TSInfos[ID - Builtin::FirstTSBuiltin]};
+}
+
+static llvm::StringRef getStrFromTable(const char *StrTable, int Offset) {
+  return &StrTable[Offset];
+}
+
+/// Return the identifier name for the specified builtin,
+/// e.g. "__builtin_abs".
+llvm::StringRef Builtin::Context::getName(unsigned ID) const {
+  const auto &[StrTable, I] = getStrTableAndInfo(ID);
+  return getStrFromTable(StrTable, I.Offsets.Name);
+}
+
+const char *Builti...
[truncated]

``````````

</details>


https://github.com/llvm/llvm-project/pull/118734


More information about the cfe-commits mailing list