[clang] [llvm] [StrTable] Switch intrinsics to StringTable and work around MSVC (PR #123548)
Reid Kleckner via llvm-commits
llvm-commits at lists.llvm.org
Mon Jan 27 12:57:30 PST 2025
================
@@ -51,28 +57,71 @@ class StringToOffsetTable {
return II->second;
}
- // Emit the string using string literal concatenation, for better readability
- // and searchability.
- void EmitStringLiteralDef(raw_ostream &OS, const Twine &Decl,
- const Twine &Indent = " ") const {
+ // Emit a string table definition with the provided name and indent.
+ //
+ // When possible, this uses string-literal concatenation to emit the string
+ // contents in a readable and searchable way. However, for (very) large string
+ // tables MSVC cannot reliably use string literals and so there we use a large
+ // character array. We still use a line oriented emission and add comments to
+ // provide searchability even in this case.
+ //
+ // The string table, and its input string contents, are always emitted as both
+ // `static` and `constexpr`. Both `Name` and (`Name` + "Storage") must be
+ // valid identifiers to declare.
+ void EmitStringTableDef(raw_ostream &OS, const Twine &Name,
+ const Twine &Indent = "") const {
OS << formatv(R"(
#ifdef __GNUC__
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Woverlength-strings"
#endif
-{0}{1} = )",
- Indent, Decl);
+{0}static constexpr char {1}Storage[] = )",
+ Indent, Name);
+
+ // MSVC silently miscompiles string literals longer than 64k in some
+ // circumstances. When the string table is longer, emit it as an array of
+ // character literals.
+ bool UseChars = AggregateString.size() > (64 * 1024);
----------------
rnk wrote:
I believe the long (64K+) character table literals have previously been identified as a compile time bottleneck, see https://reviews.llvm.org/D73044, which is mostly about readability, but it does mention compile time as a concern. The way this is written now, it seems like Linux developers are going to pay the cost to parse these long tables character by character, which I think is worth avoiding. I think it's just a matter of moving the existing `--long-string-literals` flag from llvm/utils/TableGen/Basic/TableGen.cpp to somewhere in the TableGen library and checking it here.
https://github.com/llvm/llvm-project/pull/123548
More information about the llvm-commits
mailing list