[lld] [lld/MachO] Fix assert on unsorted data-in-code entries (PR #81758)

Nico Weber via llvm-commits llvm-commits at lists.llvm.org
Wed Feb 14 08:31:24 PST 2024


https://github.com/nico created https://github.com/llvm/llvm-project/pull/81758

When the data-in-code entries are in separate sections, they are not guaranteed to be sorted. In particular, 68b1cc36f3df marked some libc++ string functions as noinline, which leads to global ctors involving strings now producing data-in-code sections in __TEXT,__StaticInit, which is why this now happens in practice.

Since data-in-code entries are relatively rare and small, just sort them. No observed performance impact.

See also crbug.com/41487860

>From edd292b29c91c380722fcad095fa4441fd8a3756 Mon Sep 17 00:00:00 2001
From: Nico Weber <thakis at chromium.org>
Date: Wed, 14 Feb 2024 11:25:01 -0500
Subject: [PATCH] [lld/MachO] Fix assert on unsorted data-in-code entries

When the data-in-code entries are in separate sections, they are
not guaranteed to be sorted. In particular, 68b1cc36f3df marked
some libc++ string functions as noinline, which leads to global
ctors involving strings now producing data-in-code sections in
__TEXT,__StaticInit, which is why this now happens in practice.

Since data-in-code entries are relatively rare and small, just
sort them. No observed performance impact.

See also crbug.com/41487860
---
 lld/MachO/SyntheticSections.cpp | 13 ++++++++-----
 lld/test/MachO/data-in-code.s   | 22 +++++++++++++++++-----
 2 files changed, 25 insertions(+), 10 deletions(-)

diff --git a/lld/MachO/SyntheticSections.cpp b/lld/MachO/SyntheticSections.cpp
index 544847d3d448c4..6dbf27034f115e 100644
--- a/lld/MachO/SyntheticSections.cpp
+++ b/lld/MachO/SyntheticSections.cpp
@@ -1050,10 +1050,13 @@ static std::vector<MachO::data_in_code_entry> collectDataInCodeEntries() {
     if (entries.empty())
       continue;
 
-    assert(is_sorted(entries, [](const data_in_code_entry &lhs,
-                                 const data_in_code_entry &rhs) {
+    std::vector<MachO::data_in_code_entry> sortedEntries;
+    sortedEntries.assign(entries.begin(), entries.end());
+    llvm::sort(sortedEntries, [](const data_in_code_entry &lhs,
+                                      const data_in_code_entry &rhs) {
       return lhs.offset < rhs.offset;
-    }));
+    });
+
     // For each code subsection find 'data in code' entries residing in it.
     // Compute the new offset values as
     // <offset within subsection> + <subsection address> - <__TEXT address>.
@@ -1066,12 +1069,12 @@ static std::vector<MachO::data_in_code_entry> collectDataInCodeEntries() {
           continue;
         const uint64_t beginAddr = section->addr + subsec.offset;
         auto it = llvm::lower_bound(
-            entries, beginAddr,
+            sortedEntries, beginAddr,
             [](const MachO::data_in_code_entry &entry, uint64_t addr) {
               return entry.offset < addr;
             });
         const uint64_t endAddr = beginAddr + isec->getSize();
-        for (const auto end = entries.end();
+        for (const auto end = sortedEntries.end();
              it != end && it->offset + it->length <= endAddr; ++it)
           dataInCodeEntries.push_back(
               {static_cast<uint32_t>(isec->getVA(it->offset - beginAddr) -
diff --git a/lld/test/MachO/data-in-code.s b/lld/test/MachO/data-in-code.s
index 49aa7655a84b0c..1a09359cd26767 100644
--- a/lld/test/MachO/data-in-code.s
+++ b/lld/test/MachO/data-in-code.s
@@ -6,7 +6,7 @@
 # RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %t/bar.s -o %t/bar.o
 # RUN: %lld -lSystem %t/foo.o %t/bar.o -o %t/main.exe
 # RUN: llvm-otool -l %t/main.exe > %t/objdump
-# RUN: llvm-objdump --macho --data-in-code %t/main.exe >> %t/objdump
+# RUN: llvm-otool -Gv %t/main.exe >> %t/objdump
 # RUN: FileCheck %s < %t/objdump
 
 # CHECK-LABEL:  sectname __text
@@ -18,12 +18,13 @@
 # CHECK-LABEL:  cmd LC_DATA_IN_CODE
 # CHECK-NEXT:   cmdsize 16
 # CHECK-NEXT:   dataoff
-# CHECK-NEXT:   datasize 16
+# CHECK-NEXT:   datasize 24
 
-# CHECK-LABEL:  Data in code table (2 entries)
+# CHECK-LABEL:  Data in code table (3 entries)
 # CHECK-NEXT:   offset length kind
 # CHECK-NEXT:   [[#%x,TEXT + 28]] 24 JUMP_TABLE32
-# CHECK-NEXT:   [[#%x,TEXT + 68]] 12 JUMP_TABLE32
+# CHECK-NEXT:   [[#%x,TEXT + 68]] 8 JUMP_TABLE32
+# CHECK-NEXT:   [[#%x,TEXT + 84]] 12 JUMP_TABLE32
 
 # RUN: %lld -lSystem %t/foo.o %t/bar.o -no_data_in_code_info -o %t/main.exe
 # RUN: llvm-otool -l %t/main.exe | FileCheck --check-prefix=OMIT %s
@@ -32,11 +33,22 @@
 
 # RUN: %lld -lSystem %t/foo.o %t/bar.o -no_data_in_code_info -data_in_code_info -o %t/main.exe
 # RUN: llvm-otool -l %t/main.exe > %t/objdump
-# RUN: llvm-objdump --macho --data-in-code %t/main.exe >> %t/objdump
+# RUN: llvm-otool -Gv %t/main.exe >> %t/objdump
 # RUN: FileCheck %s < %t/objdump
 
 #--- foo.s
 .text
+.section __TEXT,__StaticInit,regular,pure_instructions
+.p2align 4, 0x90
+_some_init_function:
+retq
+.p2align 2, 0x90
+.data_region jt32
+.long 0
+.long 0
+.end_data_region
+
+.section __TEXT,__text,regular,pure_instructions
 .globl _main
 .p2align 4, 0x90
 _main:



More information about the llvm-commits mailing list