[clang] [llvm] [StrTable] Switch intrinsics to StringTable and work around MSVC (PR #123548)
Sergei Barannikov via llvm-commits
llvm-commits at lists.llvm.org
Sun Jan 26 00:25:59 PST 2025
================
@@ -51,28 +57,71 @@ class StringToOffsetTable {
return II->second;
}
- // Emit the string using string literal concatenation, for better readability
- // and searchability.
- void EmitStringLiteralDef(raw_ostream &OS, const Twine &Decl,
- const Twine &Indent = " ") const {
+ // Emit a string table definition with the provided name and indent.
+ //
+ // When possible, this uses string-literal concatenation to emit the string
+ // contents in a readable and searchable way. However, for (very) large string
+ // tables MSVC cannot reliably use string literals and so there we use a large
+ // character array. We still use a line oriented emission and add comments to
+ // provide searchability even in this case.
+ //
+ // The string table, and its input string contents, are always emitted as both
+ // `static` and `constexpr`. Both `Name` and (`Name` + "Storage") must be
+ // valid identifiers to declare.
+ void EmitStringTableDef(raw_ostream &OS, const Twine &Name,
+ const Twine &Indent = "") const {
OS << formatv(R"(
#ifdef __GNUC__
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Woverlength-strings"
#endif
-{0}{1} = )",
- Indent, Decl);
+{0}static constexpr char {1}Storage[] = )",
+ Indent, Name);
+
+ // MSVC silently miscompiles string literals longer than 64k in some
+ // circumstances. When the string table is longer, emit it as an array of
+ // character literals.
+ bool UseChars = AggregateString.size() > (64 * 1024);
----------------
s-barannikov wrote:
> I don't know the history of why we have two offset table emission systems.
> I can try to consolidate the divergence that his grown between these two if necessary, but it's a pretty big undertaking.
`SequenceToOffsetTable` supports arbitrary sequences (e.g. register numbers). I'm not sure if consolidating them is possible or necessary.
> To me, it seems a simpler model to just unconditionally emit tables larger than 64k using the awkward syntax rather than add more tablegen flags that change behavior. This lets the vast majority of cases use the simpler form.
This is just two LoC change (declaring the option and using it in the `if`) that will avoid compile time penalty if the compiler is not MSVC and make the [large] generated tables readable/searchable. Not a big win probably, so LGTM as is.
https://github.com/llvm/llvm-project/pull/123548
More information about the llvm-commits
mailing list