[Lldb-commits] [clang] [lldb] [llvm] [StrTable] Switch intrinsics to StringTable and work around MSVC (PR #123548)

via lldb-commits lldb-commits at lists.llvm.org
Sun Jan 19 21:04:56 PST 2025


llvmbot wrote:


<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-llvm-adt

@llvm/pr-subscribers-clang

Author: Chandler Carruth (chandlerc)

<details>
<summary>Changes</summary>

**Note:** This PR depends on #<!-- -->123302 and #<!-- -->123308 -- only the last of the three commits should be reviewed here.

---

Historically, the main example of *very* large string tables used the
`EmitCharArray` to work around MSVC limitations with string literals,
but that was switched (without removing the API) in order to consolidate
on a nicer emission primitive.

While this large string table in `IntrinsicsImpl.inc` seems to compile
correctly on MSVC without the work around in `EmitCharArray` (and that
this PR adds back to the nicer emission path), other users have
repeatedly hit this MSVC limitation as you can see in the discussion on
PR https://github.com/llvm/llvm-project/pull/120534. This PR teaches the string offset table emission to look at
the size of the table and switch to the char array emission strategy
when the table becomes too large.

This work around does have the downside of making compile times worse
for large string tables, but that appears unavoidable until we can
identify known good MSVC versions and switch to requiring them for all
LLVM users. It also reduces searchability of the generated string table
-- I looked at emitting a comment with each string but it is tricky
because the escaping rules for an inline comment are different from
those of of a string literal, and there's no real way to turn the string
literal into a comment.

This PR also switches the `IntrinsicsImpl.inc` string tables over to the
new `StringTable` runtime abstraction. I didn't want to do this until
landing the MSVC workaround in case it caused even this example to start
hitting the MSVC bug, but I wanted to switch here so that I could
simplify the API for emitting the string table with the workaround
present. With the two different emission strategies, its important to
use a very exact syntax and that seems better encapsulated in the API.

This PR should unblock landing https://github.com/llvm/llvm-project/pull/120534 and letting us switch all of
Clang's builtins to use string tables. That PR has all the details
motivating the overall effort.

Follow-up patches will try to consolidate the remaining users onto the
single interface, but those at least were easy to separate into
follow-ups and keep this PR somewhat smaller.

---

Patch is 37.28 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/123548.diff


15 Files Affected:

- (modified) clang/lib/Basic/DiagnosticIDs.cpp (+8-10) 
- (modified) clang/lib/Frontend/CompilerInvocation.cpp (+1-1) 
- (modified) clang/tools/diagtool/DiagnosticNames.cpp (+2-1) 
- (modified) clang/utils/TableGen/ClangDiagnosticsEmitter.cpp (+6-18) 
- (modified) lldb/source/Plugins/Platform/MacOSX/PlatformDarwin.cpp (+2-1) 
- (modified) llvm/include/llvm/ADT/StringTable.h (+49-1) 
- (modified) llvm/include/llvm/Option/OptTable.h (+38-29) 
- (modified) llvm/include/llvm/TableGen/StringToOffsetTable.h (+51-31) 
- (modified) llvm/lib/IR/Intrinsics.cpp (+10-9) 
- (modified) llvm/lib/Option/OptTable.cpp (+36-34) 
- (modified) llvm/test/TableGen/MixedCasedMnemonic.td (+1-1) 
- (modified) llvm/tools/llvm-objdump/llvm-objdump.cpp (+2-1) 
- (modified) llvm/unittests/Option/OptionMarshallingTest.cpp (+2-1) 
- (modified) llvm/utils/TableGen/Basic/IntrinsicEmitter.cpp (+3-4) 
- (modified) llvm/utils/TableGen/OptionParserEmitter.cpp (+5-8) 


``````````diff
diff --git a/clang/lib/Basic/DiagnosticIDs.cpp b/clang/lib/Basic/DiagnosticIDs.cpp
index d77f28c80b2eb2..55f868147134b7 100644
--- a/clang/lib/Basic/DiagnosticIDs.cpp
+++ b/clang/lib/Basic/DiagnosticIDs.cpp
@@ -16,6 +16,7 @@
 #include "clang/Basic/SourceManager.h"
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/StringTable.h"
 #include "llvm/Support/ErrorHandling.h"
 #include "llvm/Support/Path.h"
 #include <map>
@@ -618,11 +619,7 @@ namespace {
     uint16_t SubGroups;
     StringRef Documentation;
 
-    // String is stored with a pascal-style length byte.
-    StringRef getName() const {
-      return StringRef(DiagGroupNames + NameOffset + 1,
-                       DiagGroupNames[NameOffset]);
-    }
+    StringRef getName() const { return DiagGroupNames[NameOffset]; }
   };
 }
 
@@ -669,11 +666,12 @@ StringRef DiagnosticIDs::getWarningOptionForDiag(unsigned DiagID) {
 
 std::vector<std::string> DiagnosticIDs::getDiagnosticFlags() {
   std::vector<std::string> Res{"-W", "-Wno-"};
-  for (size_t I = 1; DiagGroupNames[I] != '\0';) {
-    std::string Diag(DiagGroupNames + I + 1, DiagGroupNames[I]);
-    I += DiagGroupNames[I] + 1;
-    Res.push_back("-W" + Diag);
-    Res.push_back("-Wno-" + Diag);
+  for (StringRef Name : DiagGroupNames) {
+    if (Name.empty())
+      continue;
+
+    Res.push_back((Twine("-W") + Name).str());
+    Res.push_back((Twine("-Wno-") + Name).str());
   }
 
   return Res;
diff --git a/clang/lib/Frontend/CompilerInvocation.cpp b/clang/lib/Frontend/CompilerInvocation.cpp
index 58658dedbaf1ee..3bf124e4827be9 100644
--- a/clang/lib/Frontend/CompilerInvocation.cpp
+++ b/clang/lib/Frontend/CompilerInvocation.cpp
@@ -282,7 +282,7 @@ using ArgumentConsumer = CompilerInvocation::ArgumentConsumer;
 #undef OPTTABLE_STR_TABLE_CODE
 
 static llvm::StringRef lookupStrInTable(unsigned Offset) {
-  return &OptionStrTable[Offset];
+  return OptionStrTable[Offset];
 }
 
 #define SIMPLE_ENUM_VALUE_TABLE
diff --git a/clang/tools/diagtool/DiagnosticNames.cpp b/clang/tools/diagtool/DiagnosticNames.cpp
index eb90f082437b33..1004e7bf2063b1 100644
--- a/clang/tools/diagtool/DiagnosticNames.cpp
+++ b/clang/tools/diagtool/DiagnosticNames.cpp
@@ -9,6 +9,7 @@
 #include "DiagnosticNames.h"
 #include "clang/Basic/AllDiagnostics.h"
 #include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/StringTable.h"
 
 using namespace clang;
 using namespace diagtool;
@@ -74,7 +75,7 @@ static const GroupRecord OptionTable[] = {
 };
 
 llvm::StringRef GroupRecord::getName() const {
-  return StringRef(DiagGroupNames + NameOffset + 1, DiagGroupNames[NameOffset]);
+  return DiagGroupNames[NameOffset];
 }
 
 GroupRecord::subgroup_iterator GroupRecord::subgroup_begin() const {
diff --git a/clang/utils/TableGen/ClangDiagnosticsEmitter.cpp b/clang/utils/TableGen/ClangDiagnosticsEmitter.cpp
index fb00c640d6b144..50dbe4d5a8cab7 100644
--- a/clang/utils/TableGen/ClangDiagnosticsEmitter.cpp
+++ b/clang/utils/TableGen/ClangDiagnosticsEmitter.cpp
@@ -1782,19 +1782,11 @@ static void emitDiagArrays(DiagsInGroupTy &DiagsInGroup,
 
 /// Emit a list of group names.
 ///
-/// This creates a long string which by itself contains a list of pascal style
-/// strings, which consist of a length byte directly followed by the string.
-///
-/// \code
-///   static const char DiagGroupNames[] = {
-///     \000\020#pragma-messages\t#warnings\020CFString-literal"
-///   };
-/// \endcode
+/// This creates an `llvm::StringTable` of all the diagnostic group names.
 static void emitDiagGroupNames(const StringToOffsetTable &GroupNames,
                                raw_ostream &OS) {
-  OS << "static const char DiagGroupNames[] = {\n";
-  GroupNames.EmitString(OS);
-  OS << "};\n\n";
+  GroupNames.EmitStringTableDef(OS, "DiagGroupNames");
+  OS << "\n";
 }
 
 /// Emit diagnostic arrays and related data structures.
@@ -1806,7 +1798,7 @@ static void emitDiagGroupNames(const StringToOffsetTable &GroupNames,
 ///  #ifdef GET_DIAG_ARRAYS
 ///     static const int16_t DiagArrays[];
 ///     static const int16_t DiagSubGroups[];
-///     static const char DiagGroupNames[];
+///     static constexpr llvm::StringTable DiagGroupNames;
 ///  #endif
 ///  \endcode
 static void emitAllDiagArrays(DiagsInGroupTy &DiagsInGroup,
@@ -1858,9 +1850,7 @@ static void emitDiagTable(DiagsInGroupTy &DiagsInGroup,
                                "0123456789!@#$%^*-+=:?") != std::string::npos)
       PrintFatalError("Invalid character in diagnostic group '" + Name + "'");
     OS << Name << " */, ";
-    // Store a pascal-style length byte at the beginning of the string.
-    std::string PascalName = char(Name.size()) + Name.str();
-    OS << *GroupNames.GetStringOffset(PascalName) << ", ";
+    OS << *GroupNames.GetStringOffset(Name) << ", ";
 
     // Special handling for 'pedantic'.
     const bool IsPedantic = Name == "pedantic";
@@ -1949,9 +1939,7 @@ void clang::EmitClangDiagGroups(const RecordKeeper &Records, raw_ostream &OS) {
 
   StringToOffsetTable GroupNames;
   for (const auto &[Name, Group] : DiagsInGroup) {
-    // Store a pascal-style length byte at the beginning of the string.
-    std::string PascalName = char(Name.size()) + Name.str();
-    GroupNames.GetOrAddStringOffset(PascalName, false);
+    GroupNames.GetOrAddStringOffset(Name);
   }
 
   emitAllDiagArrays(DiagsInGroup, DiagsInPedantic, GroupsInPedantic, GroupNames,
diff --git a/lldb/source/Plugins/Platform/MacOSX/PlatformDarwin.cpp b/lldb/source/Plugins/Platform/MacOSX/PlatformDarwin.cpp
index 2a36f95c94d0ce..51e9a6d81b8390 100644
--- a/lldb/source/Plugins/Platform/MacOSX/PlatformDarwin.cpp
+++ b/lldb/source/Plugins/Platform/MacOSX/PlatformDarwin.cpp
@@ -42,6 +42,7 @@
 #include "lldb/Utility/Status.h"
 #include "lldb/Utility/Timer.h"
 #include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/StringTable.h"
 #include "llvm/Support/Error.h"
 #include "llvm/Support/FileSystem.h"
 #include "llvm/Support/Threading.h"
@@ -1083,7 +1084,7 @@ void PlatformDarwin::AddClangModuleCompilationOptionsForSDKType(
   if (!version.empty() && sdk_type != XcodeSDK::Type::Linux &&
       sdk_type != XcodeSDK::Type::XROS) {
 #define OPTION(PREFIX_OFFSET, NAME_OFFSET, VAR, ...)                           \
-  llvm::StringRef opt_##VAR = &OptionStrTable[NAME_OFFSET];                    \
+  llvm::StringRef opt_##VAR = OptionStrTable[NAME_OFFSET];                     \
   (void)opt_##VAR;
 #include "clang/Driver/Options.inc"
 #undef OPTION
diff --git a/llvm/include/llvm/ADT/StringTable.h b/llvm/include/llvm/ADT/StringTable.h
index 4049f892fa66e0..5f9ecc452808de 100644
--- a/llvm/include/llvm/ADT/StringTable.h
+++ b/llvm/include/llvm/ADT/StringTable.h
@@ -10,6 +10,8 @@
 #define LLVM_ADT_STRING_TABLE_H
 
 #include "llvm/ADT/StringRef.h"
+#include "llvm/ADT/iterator.h"
+#include <iterator>
 #include <limits>
 
 namespace llvm {
@@ -51,6 +53,14 @@ class StringTable {
     constexpr Offset() = default;
     constexpr Offset(unsigned Value) : Value(Value) {}
 
+    friend constexpr bool operator==(const Offset &LHS, const Offset &RHS) {
+      return LHS.Value == RHS.Value;
+    }
+
+    friend constexpr bool operator!=(const Offset &LHS, const Offset &RHS) {
+      return LHS.Value != RHS.Value;
+    }
+
     constexpr unsigned value() const { return Value; }
   };
 
@@ -69,9 +79,13 @@ class StringTable {
     assert(!Table.empty() && "Requires at least a valid empty string.");
     assert(Table.data()[0] == '\0' && "Offset zero must be the empty string.");
     // Ensure that `strlen` from any offset cannot overflow the end of the table
-    // by insisting on a null byte at the end.
+    // by insisting on a null byte at the end. We also insist on the last string
+    // within the table being *separately* null terminated. This structure is
+    // used to enable predictable iteration over all the strings when needed.
     assert(Table.data()[Table.size() - 1] == '\0' &&
            "Last byte must be a null byte.");
+    assert(Table.data()[Table.size() - 2] == '\0' &&
+           "Next-to-last byte must be a null byte.");
   }
 
   // Get a string from the table starting with the provided offset. The returned
@@ -84,6 +98,40 @@ class StringTable {
 
   /// Returns the byte size of the table.
   constexpr size_t size() const { return Table.size(); }
+
+  class Iterator
+      : public iterator_facade_base<Iterator, std::forward_iterator_tag,
+                                    const StringRef> {
+    friend StringTable;
+
+    const StringTable *Table;
+    Offset O;
+
+    // A cache of one value to allow `*` to return a reference.
+    mutable StringRef S;
+
+    explicit constexpr Iterator(const StringTable &Table, Offset O)
+        : Table(&Table), O(O) {}
+
+  public:
+    constexpr Iterator(const Iterator &RHS) = default;
+    constexpr Iterator(Iterator &&RHS) = default;
+
+    bool operator==(const Iterator &RHS) const {
+      assert(Table == RHS.Table && "Compared iterators for unrelated tables!");
+      return O == RHS.O;
+    }
+
+    const StringRef &operator*() const { return (S = (*Table)[O]); }
+
+    Iterator &operator++() {
+      O = O.value() + (*Table)[O].size() + 1;
+      return *this;
+    }
+  };
+
+  constexpr Iterator begin() const { return Iterator(*this, 0); }
+  constexpr Iterator end() const { return Iterator(*this, size() - 1); }
 };
 
 } // namespace llvm
diff --git a/llvm/include/llvm/Option/OptTable.h b/llvm/include/llvm/Option/OptTable.h
index 38a03fef7ae124..61a58aa304ecb4 100644
--- a/llvm/include/llvm/Option/OptTable.h
+++ b/llvm/include/llvm/Option/OptTable.h
@@ -12,6 +12,7 @@
 #include "llvm/ADT/ArrayRef.h"
 #include "llvm/ADT/SmallString.h"
 #include "llvm/ADT/StringRef.h"
+#include "llvm/ADT/StringTable.h"
 #include "llvm/Option/OptSpecifier.h"
 #include "llvm/Support/StringSaver.h"
 #include <cassert>
@@ -54,7 +55,7 @@ class OptTable {
   /// Entry for a single option instance in the option data table.
   struct Info {
     unsigned PrefixesOffset;
-    unsigned PrefixedNameOffset;
+    StringTable::Offset PrefixedNameOffset;
     const char *HelpText;
     // Help text for specific visibilities. A list of pairs, where each pair
     // is a list of visibilities and a specific help string for those
@@ -80,34 +81,37 @@ class OptTable {
 
     bool hasNoPrefix() const { return PrefixesOffset == 0; }
 
-    unsigned getNumPrefixes(ArrayRef<unsigned> PrefixesTable) const {
-      return PrefixesTable[PrefixesOffset];
+    unsigned getNumPrefixes(ArrayRef<StringTable::Offset> PrefixesTable) const {
+      // We embed the number of prefixes in the value of the first offset.
+      return PrefixesTable[PrefixesOffset].value();
     }
 
-    ArrayRef<unsigned>
-    getPrefixOffsets(ArrayRef<unsigned> PrefixesTable) const {
-      return hasNoPrefix() ? ArrayRef<unsigned>()
+    ArrayRef<StringTable::Offset>
+    getPrefixOffsets(ArrayRef<StringTable::Offset> PrefixesTable) const {
+      return hasNoPrefix() ? ArrayRef<StringTable::Offset>()
                            : PrefixesTable.slice(PrefixesOffset + 1,
                                                  getNumPrefixes(PrefixesTable));
     }
 
-    void appendPrefixes(const char *StrTable, ArrayRef<unsigned> PrefixesTable,
+    void appendPrefixes(const StringTable &StrTable,
+                        ArrayRef<StringTable::Offset> PrefixesTable,
                         SmallVectorImpl<StringRef> &Prefixes) const {
-      for (unsigned PrefixOffset : getPrefixOffsets(PrefixesTable))
-        Prefixes.push_back(&StrTable[PrefixOffset]);
+      for (auto PrefixOffset : getPrefixOffsets(PrefixesTable))
+        Prefixes.push_back(StrTable[PrefixOffset]);
     }
 
-    StringRef getPrefix(const char *StrTable, ArrayRef<unsigned> PrefixesTable,
+    StringRef getPrefix(const StringTable &StrTable,
+                        ArrayRef<StringTable::Offset> PrefixesTable,
                         unsigned PrefixIndex) const {
-      return &StrTable[getPrefixOffsets(PrefixesTable)[PrefixIndex]];
+      return StrTable[getPrefixOffsets(PrefixesTable)[PrefixIndex]];
     }
 
-    StringRef getPrefixedName(const char *StrTable) const {
-      return &StrTable[PrefixedNameOffset];
+    StringRef getPrefixedName(const StringTable &StrTable) const {
+      return StrTable[PrefixedNameOffset];
     }
 
-    StringRef getName(const char *StrTable,
-                      ArrayRef<unsigned> PrefixesTable) const {
+    StringRef getName(const StringTable &StrTable,
+                      ArrayRef<StringTable::Offset> PrefixesTable) const {
       unsigned PrefixLength =
           hasNoPrefix() ? 0 : getPrefix(StrTable, PrefixesTable, 0).size();
       return getPrefixedName(StrTable).drop_front(PrefixLength);
@@ -117,13 +121,13 @@ class OptTable {
 private:
   // A unified string table for these options. Individual strings are stored as
   // null terminated C-strings at offsets within this table.
-  const char *StrTable;
+  const StringTable *StrTable;
 
   // A table of different sets of prefixes. Each set starts with the number of
   // prefixes in that set followed by that many offsets into the string table
   // for each of the prefix strings. This is essentially a Pascal-string style
   // encoding.
-  ArrayRef<unsigned> PrefixesTable;
+  ArrayRef<StringTable::Offset> PrefixesTable;
 
   /// The option information table.
   ArrayRef<Info> OptionInfos;
@@ -161,7 +165,8 @@ class OptTable {
 protected:
   /// Initialize OptTable using Tablegen'ed OptionInfos. Child class must
   /// manually call \c buildPrefixChars once they are fully constructed.
-  OptTable(const char *StrTable, ArrayRef<unsigned> PrefixesTable,
+  OptTable(const StringTable &StrTable,
+           ArrayRef<StringTable::Offset> PrefixesTable,
            ArrayRef<Info> OptionInfos, bool IgnoreCase = false);
 
   /// Build (or rebuild) the PrefixChars member.
@@ -171,10 +176,12 @@ class OptTable {
   virtual ~OptTable();
 
   /// Return the string table used for option names.
-  const char *getStrTable() const { return StrTable; }
+  const StringTable &getStrTable() const { return *StrTable; }
 
   /// Return the prefixes table used for option names.
-  ArrayRef<unsigned> getPrefixesTable() const { return PrefixesTable; }
+  ArrayRef<StringTable::Offset> getPrefixesTable() const {
+    return PrefixesTable;
+  }
 
   /// Return the total number of option classes.
   unsigned getNumOptions() const { return OptionInfos.size(); }
@@ -187,25 +194,25 @@ class OptTable {
 
   /// Lookup the name of the given option.
   StringRef getOptionName(OptSpecifier id) const {
-    return getInfo(id).getName(StrTable, PrefixesTable);
+    return getInfo(id).getName(*StrTable, PrefixesTable);
   }
 
   /// Lookup the prefix of the given option.
   StringRef getOptionPrefix(OptSpecifier id) const {
     const Info &I = getInfo(id);
     return I.hasNoPrefix() ? StringRef()
-                           : I.getPrefix(StrTable, PrefixesTable, 0);
+                           : I.getPrefix(*StrTable, PrefixesTable, 0);
   }
 
   void appendOptionPrefixes(OptSpecifier id,
                             SmallVectorImpl<StringRef> &Prefixes) const {
     const Info &I = getInfo(id);
-    I.appendPrefixes(StrTable, PrefixesTable, Prefixes);
+    I.appendPrefixes(*StrTable, PrefixesTable, Prefixes);
   }
 
   /// Lookup the prefixed name of the given option.
   StringRef getOptionPrefixedName(OptSpecifier id) const {
-    return getInfo(id).getPrefixedName(StrTable);
+    return getInfo(id).getPrefixedName(*StrTable);
   }
 
   /// Get the kind of the given option.
@@ -418,19 +425,21 @@ class OptTable {
 /// Specialization of OptTable
 class GenericOptTable : public OptTable {
 protected:
-  GenericOptTable(const char *StrTable, ArrayRef<unsigned> PrefixesTable,
+  GenericOptTable(const StringTable &StrTable,
+                  ArrayRef<StringTable::Offset> PrefixesTable,
                   ArrayRef<Info> OptionInfos, bool IgnoreCase = false);
 };
 
 class PrecomputedOptTable : public OptTable {
 protected:
-  PrecomputedOptTable(const char *StrTable, ArrayRef<unsigned> PrefixesTable,
+  PrecomputedOptTable(const StringTable &StrTable,
+                      ArrayRef<StringTable::Offset> PrefixesTable,
                       ArrayRef<Info> OptionInfos,
-                      ArrayRef<unsigned> PrefixesUnionOffsets,
+                      ArrayRef<StringTable::Offset> PrefixesUnionOffsets,
                       bool IgnoreCase = false)
       : OptTable(StrTable, PrefixesTable, OptionInfos, IgnoreCase) {
-    for (unsigned PrefixOffset : PrefixesUnionOffsets)
-      PrefixesUnion.push_back(&StrTable[PrefixOffset]);
+    for (auto PrefixOffset : PrefixesUnionOffsets)
+      PrefixesUnion.push_back(StrTable[PrefixOffset]);
     buildPrefixChars();
   }
 };
diff --git a/llvm/include/llvm/TableGen/StringToOffsetTable.h b/llvm/include/llvm/TableGen/StringToOffsetTable.h
index d4bb685acce327..9cec97ed31848d 100644
--- a/llvm/include/llvm/TableGen/StringToOffsetTable.h
+++ b/llvm/include/llvm/TableGen/StringToOffsetTable.h
@@ -27,6 +27,12 @@ class StringToOffsetTable {
   std::string AggregateString;
 
 public:
+  StringToOffsetTable() {
+    // Ensure we always put the empty string at offset zero. That lets empty
+    // initialization also be zero initialization for offsets into the table.
+    GetOrAddStringOffset("");
+  }
+
   bool empty() const { return StringOffset.empty(); }
   size_t size() const { return AggregateString.size(); }
 
@@ -51,28 +57,62 @@ class StringToOffsetTable {
     return II->second;
   }
 
-  // Emit the string using string literal concatenation, for better readability
-  // and searchability.
-  void EmitStringLiteralDef(raw_ostream &OS, const Twine &Decl,
-                            const Twine &Indent = "  ") const {
+  // Emit a string table definition with the provided name and indent.
+  //
+  // When possible, this uses string-literal concatenation to emit the string
+  // contents in a readable and searchable way. However, for (very) large string
+  // tables MSVC cannot reliably use string literals and so there we use a large
+  // character array. We still use a line oriented emission and add comments to
+  // provide searchability even in this case.
+  //
+  // The string table, and its input string contents, are always emitted as both
+  // `static` and `constexpr`. Both `Name` and (`Name` + "Storage") must be
+  // valid identifiers to declare.
+  void EmitStringTableDef(raw_ostream &OS, const Twine &Name,
+                          const Twine &Indent = "") const {
     OS << formatv(R"(
 #ifdef __GNUC__
 #pragma GCC diagnostic push
 #pragma GCC diagnostic ignored "-Woverlength-strings"
 #endif
-{0}{1} = )",
-                  Indent, Decl);
+{0}static constexpr char {1}Storage[] = )",
+                  Indent, Name);
 
+    // MSVC silently miscompiles string literals longer than 64k in some
+    // circumstances. When the string table is longer, emit it as an array of
+    // character literals.
+    bool UseChars = AggregateString.size() > (64 * 1024);
+    OS << (UseChars ? "{\n" : "\n");
+
+    llvm::ListSeparator LineSep(UseChars ? ",\n" : "\n");
     for (StringRef Str : split(AggregateString, '\0')) {
-      OS << "\n" << Indent << "  \"";
-      OS.write_escaped(Str);
-      OS << "\\0\"";
+      OS << LineSep << Indent << "  ";
+      // If we can, just emit this as a string literal to be concatenated.
+      if (!UseChars) {
+        OS << "\"";
+        OS.write_escaped(Str);
+        OS << "\\0\"";
+        continue;
+      }
+
+      llvm::ListSeparator CharSep(", ");
+      for (char C : Str) {
+        OS << CharSep << "'";
+        OS.write_escaped(StringRef(&C, 1));
+        OS << "'";
+      }
+      OS << CharSep << "'\\0'";
     }
-    OS << R"(;
+    OS << LineSep << Indent << (UseChars ? "};" : "  ;");
+
+    OS << formatv(R"(
 #ifdef __GNUC__
 #pragma GCC diagnostic pop
 #endif
-)";
+
+{0}static constexpr llvm::StringTable {1} = {1}Storage;
+)",
+                  Indent, Name);
   }
 
   // Emit the string as one single string.
@@ -110,26 +150,6 @@ class StringToOffsetTable {
     }
     O << "\"";
   }
-
-  /// Emit the string using character literals. MSVC has a limitation that
-  /// string literals cannot be longer than 64K.
-  void EmitCharArray(raw_ostream &O) {
-    assert(AggregateString.find(')') == std::string::npos &&
-           "can't emit raw string with closing parens");
-    int Count = 0;
...
[truncated]

``````````

</details>


https://github.com/llvm/llvm-project/pull/123548


More information about the lldb-commits mailing list