[llvm] Fix issues with GlobalMerge on Mach-O. (PR #110046)

James Y Knight via llvm-commits llvm-commits at lists.llvm.org
Thu Sep 26 08:47:34 PDT 2024


https://github.com/jyknight updated https://github.com/llvm/llvm-project/pull/110046

>From e4ddbbbe3c280e73c774621c39b437715b16cbff Mon Sep 17 00:00:00 2001
From: James Y Knight <jyknight at google.com>
Date: Wed, 25 Sep 2024 17:24:29 -0400
Subject: [PATCH 1/2] Fix issues with GlobalMerge on Mach-O.

After PR #101222, we GlobalMerge started making transforms which are unsafe on Mach-O platforms.

Two issues, in particular, are fixed here:

1. We must never merge symbols in the `__cfstring` section, as the linker assumes each object in this section is only ever referenced directly, and that it can split the section as it likes.

Previously, we avoded this problem because CFString literals are identified by private-linkage symbols.

This patch adds a list of section-names to avoid merging, under Mach-O.

2. When the code was originally written, it had to be careful about emitting symbol aliases, due to issues with Mach-O's subsection splitting in the linker with `-dead_strip` enabled. This issue was fixed in 2016, via creation of the `.alt_entry` assembler directive, which allows creation of a symbol also implying the start of a new subsection. Unfortunately, Apple's ld-prime's support for this is buggy.

Therefore, we must _continue_ to be careful not to emit such symbol aliases. The code already avoided it for InternalLinkage symbols, but after the triggering PR, we also need to avoid it for PrivateLinkage symbols.

I will file an Apple bug-report about this issue, so that it can be fixed in a future version of ld-prime.

Fixes #104625
---
 llvm/lib/CodeGen/GlobalMerge.cpp              | 33 +++++++++++++++----
 .../Transforms/GlobalMerge/macho-sections.ll  | 25 ++++++++++++++
 .../Transforms/GlobalMerge/macho-symbols.ll   | 28 ++++++++++++++++
 3 files changed, 80 insertions(+), 6 deletions(-)
 create mode 100644 llvm/test/Transforms/GlobalMerge/macho-sections.ll
 create mode 100644 llvm/test/Transforms/GlobalMerge/macho-symbols.ll

diff --git a/llvm/lib/CodeGen/GlobalMerge.cpp b/llvm/lib/CodeGen/GlobalMerge.cpp
index c31ba6b31ad9ac..2003b74a314bac 100644
--- a/llvm/lib/CodeGen/GlobalMerge.cpp
+++ b/llvm/lib/CodeGen/GlobalMerge.cpp
@@ -578,12 +578,18 @@ bool GlobalMergeImpl::doMerge(const SmallVectorImpl<GlobalVariable *> &Globals,
       Globals[k]->replaceAllUsesWith(GEP);
       Globals[k]->eraseFromParent();
 
-      // When the linkage is not internal we must emit an alias for the original
-      // variable name as it may be accessed from another object. On non-Mach-O
-      // we can also emit an alias for internal linkage as it's safe to do so.
-      // It's not safe on Mach-O as the alias (and thus the portion of the
-      // MergedGlobals variable) may be dead stripped at link time.
-      if (Linkage != GlobalValue::InternalLinkage || !IsMachO) {
+      // Emit an alias for the original variable name. This is necessary for an
+      // external symbol, as it may be accessed from another object. For
+      // internal symbols, it's not strictly required, but it's useful.
+      //
+      // This _should_ also work on Mach-O ever since '.alt_entry' support was
+      // added in 2016. Unfortunately, there's a bug in ld-prime (present at
+      // least from Xcode 15.0 through Xcode 16.0), in which -dead_strip doesn't
+      // always honor alt_entry. To workaround this issue, we don't emit aliases
+      // on Mach-O. Except, we _must_ do so for external symbols. That means
+      // MergeExternal is broken with that linker. (That option is currently off
+      // by default on MachO).
+      if (!IsMachO || Linkage == GlobalValue::ExternalLinkage) {
         GlobalAlias *GA = GlobalAlias::create(Tys[StructIdxs[idx]], AddrSpace,
                                               Linkage, Name, GEP, &M);
         GA->setVisibility(Visibility);
@@ -640,6 +646,17 @@ void GlobalMergeImpl::setMustKeepGlobalVariables(Module &M) {
   }
 }
 
+// This function returns true if the given data Section name has custom
+// subsection-splitting semantics in Mach-O (such as splitting by a fixed size)
+//
+// See also ObjFile::parseSections and getRecordSize in lld/MachO/InputFiles.cpp
+static bool isSpecialMachOSection(StringRef Section) {
+  // Uses starts_with, since section attributes can appear at the end of the name.
+  return Section.starts_with("__DATA,__cfstring") ||
+      Section.starts_with("__DATA,__objc_classrefs") ||
+      Section.starts_with("__DATA,__objc_selrefs");
+}
+
 bool GlobalMergeImpl::run(Module &M) {
   if (!EnableGlobalMerge)
     return false;
@@ -678,6 +695,10 @@ bool GlobalMergeImpl::run(Module &M) {
     unsigned AddressSpace = PT->getAddressSpace();
     StringRef Section = GV.getSection();
 
+    // On Mach-O, some section names have special semantics. Don't merge these.
+    if (IsMachO && isSpecialMachOSection(Section))
+      continue;
+
     // Ignore all 'special' globals.
     if (GV.getName().starts_with("llvm.") || GV.getName().starts_with(".llvm."))
       continue;
diff --git a/llvm/test/Transforms/GlobalMerge/macho-sections.ll b/llvm/test/Transforms/GlobalMerge/macho-sections.ll
new file mode 100644
index 00000000000000..389480f41856e3
--- /dev/null
+++ b/llvm/test/Transforms/GlobalMerge/macho-sections.ll
@@ -0,0 +1,25 @@
+; RUN: opt -global-merge -global-merge-max-offset=100 -S -o - %s | FileCheck %s
+; RUN: opt -passes='global-merge<max-offset=100>' -S -o - %s | FileCheck %s
+
+;; Check that we do _not_ merge data with certain special section-names under Mach-O
+
+target datalayout = "e-p:64:64"
+target triple = "x86_64-apple-macos11"
+
+; CHECK-NOT: @_MergedGlobals
+
+ at cfstring1 = private global i32 1, section "__DATA,__cfstring"
+ at cfstring2 = private global i32 2, section "__DATA,__cfstring"
+ at objcclassrefs1 = private global i32 3, section "__DATA,__objc_classrefs,regular,no_dead_strip"
+ at objcclassrefs2 = private global i32 4, section "__DATA,__objc_classrefs,regular,no_dead_strip"
+ at objcselrefs1 = private global i32 5, section "__DATA,__objc_selrefs,literal_pointers,no_dead_strip"
+ at objcselrefs2 = private global i32 6, section "__DATA,__objc_selrefs,literal_pointers,no_dead_strip"
+define void @use() {
+  load ptr, ptr @cfstring1
+  load ptr, ptr @cfstring2
+  load ptr, ptr @objcclassrefs1
+  load ptr, ptr @objcclassrefs2
+  load ptr, ptr @objcselrefs1
+  load ptr, ptr @objcselrefs2
+  ret void
+}
diff --git a/llvm/test/Transforms/GlobalMerge/macho-symbols.ll b/llvm/test/Transforms/GlobalMerge/macho-symbols.ll
new file mode 100644
index 00000000000000..a3e99a55668bae
--- /dev/null
+++ b/llvm/test/Transforms/GlobalMerge/macho-symbols.ll
@@ -0,0 +1,28 @@
+; RUN: opt -global-merge -global-merge-max-offset=100 -S -o - %s | FileCheck %s
+; RUN: opt -passes='global-merge<max-offset=100>' -S -o - %s | FileCheck %s
+
+;; For Mach-O, we do not expect any alias symbols to be created for
+;; internal/private symbols by GlobalMerge.
+
+target datalayout = "e-p:64:64"
+target triple = "x86_64-apple-macos11"
+
+ at a = private global i32 1
+ at b = private global i32 2
+ at c = internal global i32 3
+ at d = internal global i32 4
+
+; CHECK: @_MergedGlobals = internal global <{ i32, i32, i32, i32 }> <{ i32 1, i32 2, i32 3, i32 4 }>, align 4
+; CHECK-NOT: alias
+
+define void @use() {
+  ; CHECK: load i32, ptr @_MergedGlobals,
+  %x = load i32, ptr @a
+  ; CHECK: load i32, ptr getelementptr inbounds (<{ i32, i32, i32, i32 }>, ptr @_MergedGlobals, i32 0, i32 1)
+  %y = load i32, ptr @b
+  ; CHECK: load i32, ptr getelementptr inbounds (<{ i32, i32, i32, i32 }>, ptr @_MergedGlobals, i32 0, i32 2)
+  %z1 = load i32, ptr @c
+  ; CHECK: load i32, ptr getelementptr inbounds (<{ i32, i32, i32, i32 }>, ptr @_MergedGlobals, i32 0, i32 3)
+  %z2 = load i32, ptr @d
+  ret void
+}

>From 92cb2d895bae331922f940fc0d4fc3f2301e6911 Mon Sep 17 00:00:00 2001
From: James Y Knight <jyknight at google.com>
Date: Thu, 26 Sep 2024 11:46:26 -0400
Subject: [PATCH 2/2] fix formatting; tweak test to use positive checks.

---
 llvm/lib/CodeGen/GlobalMerge.cpp                   |  7 ++++---
 llvm/test/Transforms/GlobalMerge/macho-sections.ll | 12 +++++++++---
 2 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/llvm/lib/CodeGen/GlobalMerge.cpp b/llvm/lib/CodeGen/GlobalMerge.cpp
index 2003b74a314bac..007bea9a6585ef 100644
--- a/llvm/lib/CodeGen/GlobalMerge.cpp
+++ b/llvm/lib/CodeGen/GlobalMerge.cpp
@@ -651,10 +651,11 @@ void GlobalMergeImpl::setMustKeepGlobalVariables(Module &M) {
 //
 // See also ObjFile::parseSections and getRecordSize in lld/MachO/InputFiles.cpp
 static bool isSpecialMachOSection(StringRef Section) {
-  // Uses starts_with, since section attributes can appear at the end of the name.
+  // Uses starts_with, since section attributes can appear at the end of the
+  // name.
   return Section.starts_with("__DATA,__cfstring") ||
-      Section.starts_with("__DATA,__objc_classrefs") ||
-      Section.starts_with("__DATA,__objc_selrefs");
+         Section.starts_with("__DATA,__objc_classrefs") ||
+         Section.starts_with("__DATA,__objc_selrefs");
 }
 
 bool GlobalMergeImpl::run(Module &M) {
diff --git a/llvm/test/Transforms/GlobalMerge/macho-sections.ll b/llvm/test/Transforms/GlobalMerge/macho-sections.ll
index 389480f41856e3..2d6f81de7eb859 100644
--- a/llvm/test/Transforms/GlobalMerge/macho-sections.ll
+++ b/llvm/test/Transforms/GlobalMerge/macho-sections.ll
@@ -1,19 +1,25 @@
 ; RUN: opt -global-merge -global-merge-max-offset=100 -S -o - %s | FileCheck %s
 ; RUN: opt -passes='global-merge<max-offset=100>' -S -o - %s | FileCheck %s
 
-;; Check that we do _not_ merge data with certain special section-names under Mach-O
+;; Check that we do _not_ merge globals which are in certain special
+;; sections under Mach-O.
 
 target datalayout = "e-p:64:64"
 target triple = "x86_64-apple-macos11"
 
-; CHECK-NOT: @_MergedGlobals
-
+; CHECK: @cfstring1 = private global i32 1, section "__DATA,__cfstring"
 @cfstring1 = private global i32 1, section "__DATA,__cfstring"
+; CHECK: @cfstring2 = private global i32 2, section "__DATA,__cfstring"
 @cfstring2 = private global i32 2, section "__DATA,__cfstring"
+; CHECK: @objcclassrefs1 = private global i32 3, section "__DATA,__objc_classrefs,regular,no_dead_strip"
 @objcclassrefs1 = private global i32 3, section "__DATA,__objc_classrefs,regular,no_dead_strip"
+; CHECK: @objcclassrefs2 = private global i32 4, section "__DATA,__objc_classrefs,regular,no_dead_strip"
 @objcclassrefs2 = private global i32 4, section "__DATA,__objc_classrefs,regular,no_dead_strip"
+; CHECK: @objcselrefs1 = private global i32 5, section "__DATA,__objc_selrefs,literal_pointers,no_dead_strip"
 @objcselrefs1 = private global i32 5, section "__DATA,__objc_selrefs,literal_pointers,no_dead_strip"
+; CHECK: @objcselrefs2 = private global i32 6, section "__DATA,__objc_selrefs,literal_pointers,no_dead_strip"
 @objcselrefs2 = private global i32 6, section "__DATA,__objc_selrefs,literal_pointers,no_dead_strip"
+
 define void @use() {
   load ptr, ptr @cfstring1
   load ptr, ptr @cfstring2



More information about the llvm-commits mailing list