[clang] [llvm] [StrTable] Switch intrinsics to StringTable and work around MSVC (PR #123548)

Reid Kleckner via llvm-commits llvm-commits at lists.llvm.org
Mon Jan 27 12:57:30 PST 2025


================
@@ -51,28 +57,71 @@ class StringToOffsetTable {
     return II->second;
   }
 
-  // Emit the string using string literal concatenation, for better readability
-  // and searchability.
-  void EmitStringLiteralDef(raw_ostream &OS, const Twine &Decl,
-                            const Twine &Indent = "  ") const {
+  // Emit a string table definition with the provided name and indent.
+  //
+  // When possible, this uses string-literal concatenation to emit the string
+  // contents in a readable and searchable way. However, for (very) large string
+  // tables MSVC cannot reliably use string literals and so there we use a large
+  // character array. We still use a line oriented emission and add comments to
+  // provide searchability even in this case.
+  //
+  // The string table, and its input string contents, are always emitted as both
+  // `static` and `constexpr`. Both `Name` and (`Name` + "Storage") must be
+  // valid identifiers to declare.
+  void EmitStringTableDef(raw_ostream &OS, const Twine &Name,
+                          const Twine &Indent = "") const {
     OS << formatv(R"(
 #ifdef __GNUC__
 #pragma GCC diagnostic push
 #pragma GCC diagnostic ignored "-Woverlength-strings"
 #endif
-{0}{1} = )",
-                  Indent, Decl);
+{0}static constexpr char {1}Storage[] = )",
+                  Indent, Name);
+
+    // MSVC silently miscompiles string literals longer than 64k in some
+    // circumstances. When the string table is longer, emit it as an array of
+    // character literals.
+    bool UseChars = AggregateString.size() > (64 * 1024);
----------------
rnk wrote:

I believe the long (64K+) character table literals have previously been identified as a compile time bottleneck, see https://reviews.llvm.org/D73044, which is mostly about readability, but it does mention compile time as a concern. The way this is written now, it seems like Linux developers are going to pay the cost to parse these long tables character by character, which I think is worth avoiding. I think it's just a matter of moving the existing `--long-string-literals` flag from llvm/utils/TableGen/Basic/TableGen.cpp to somewhere in the TableGen library and checking it here.

https://github.com/llvm/llvm-project/pull/123548


More information about the llvm-commits mailing list