[lld] [lld][MachO] Tail merge strings (PR #161262)

Ellis Hoag via llvm-commits llvm-commits at lists.llvm.org
Wed Oct 1 10:07:19 PDT 2025


================
@@ -1754,17 +1755,57 @@ void DeduplicatedCStringSection::finalizeContents() {
       assert(isec->align != 0);
       auto align = getStringPieceAlignment(isec, piece);
       auto [it, wasInserted] = strToAlignment.try_emplace(s, align);
+      if (wasInserted)
+        deduplicatedStrs.push_back(s);
       if (!wasInserted && it->second < align)
         it->second = align;
     }
   }
 
+  // Like lexigraphical sort, except we read strings in reverse and take the
+  // longest string first
+  // TODO: We could improve performance by implementing our own sort that avoids
+  // comparing characters we know to be the same. See
+  // StringTableBuilder::multikeySort() for details
+  llvm::sort(deduplicatedStrs, [](const auto &left, const auto &right) {
+    for (const auto &[leftChar, rightChar] :
+         llvm::zip(llvm::reverse(left.val()), llvm::reverse(right.val()))) {
+      if (leftChar == rightChar)
+        continue;
+      return leftChar < rightChar;
+    }
+    return left.size() > right.size();
+  });
+  std::optional<CachedHashStringRef> mergeCandidate;
+  DenseMap<CachedHashStringRef, std::pair<CachedHashStringRef, uint64_t>>
+      tailMergeMap;
+  for (auto &s : deduplicatedStrs) {
+    if (!mergeCandidate || !mergeCandidate->val().ends_with(s.val())) {
+      mergeCandidate = s;
+      continue;
+    }
+    uint64_t tailOffset = mergeCandidate->size() - s.size();
+    // TODO: If the tail offset is incompatible with this string's alignment, we
+    // might be able to find another superstring with a compatible tail offset.
+    // The difficulty is how to do this efficiently
+    const auto &align = strToAlignment.at(s);
+    if (!isAligned(align, tailOffset))
----------------
ellishg wrote:

The `isAligned(align, tailOffset)` check assumes that the superstring is laid out at address zero. If this check fails, it's impossible to tail merge with this candidate correctly.

But if the check passes, then we also need to make sure the superstring does not have a weaker alignment than the substring.

Suppose the superstring is at address `0x1004` and the substring could tail merge at address `0x1004 + 8`, but requires an alignment of 8. That would break alignment, so instead we need to force the superstring to have alignment 8 so it is laid out at address `0x1008`.

https://github.com/llvm/llvm-project/pull/161262


More information about the llvm-commits mailing list