<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/97155>97155</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[lld-macho] Symbols in `__mod_init_func` are handled hackily with `-init_offsets`
</td>
</tr>
<tr>
<th>Labels</th>
<td>
code-quality,
lld:MachO
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
BertalanD
</td>
</tr>
</table>
<pre>
Background: when linking macOS binaries with chained fixups, we need to transform initializers stored in `__mod_init_func` (an array of pointers *rebased* through the usual means at runtime) to `__init_offsets` (an array of 32-bit offsets to initializers).
Normally, we only need to care about the relocations in the input `__mod_init_func` sections. A problem arises when there are also *symbols* defined inside it. We currently ignore them in LLD -- that is, we don't add them to the symbol table (since the location they point to don't exist anymore). This doesn't happen in regular binaries created by Clang, `swiftc`, `rustc`, etc., but there have been a few instances, where this led to crashes:
- In [#94716](https://github.com/llvm/llvm-project/issues/94716), we see a go-generated binary (the repro file is broken and doesn't include a bunch of swift stuff from the SDK -- TODO!). Here, the symbol (`__rt0_arm64_ios_lib.ptr`) is defined inside `__mod_init_func` as a non-exported symbol; we crash when trying to add it to the symbol table (it has no corresponding output section, so we can't set `n_sect`).
```
$ lipo -thin arm64 Users/snqu/Desktop/chromium/chromium/src/ios/chrome/vpn_extension/Tun2socks.framework/Tun2socks -o Tun2socks
$ $(brew --prefix llvm)/bin/llvm-ar x Tun2socks go.o
$ nm go.o | grep __rt0_arm64_ios_lib\.ptr
0000000001af3a60 s __rt0_arm64_ios_lib.ptr
```
Backtrace excerpt:
```
* thread #6, stop reason = EXC_BAD_ACCESS (code=1, address=0x34)
* frame #0: 0x00000001002a0894 ld64.lld`SymtabSectionImpl<lld::macho::LP64>::writeTo(this=<unavailable>, buf=<unavailable>) const at SyntheticSections.cpp:1415:50 [opt]
```
- This [Chromium bug](https://issues.chromium.org/issues/325133695) is related to the [`curl` Rust crate](https://github.com/alexcrichton/curl-rust), which [deliberately defines](https://github.com/alexcrichton/curl-rust/blob/c01261310f13c85dc70d4e8a1ef87504662a1154/src/lib.rs#L123-L151) a symbol among the initializers, apparently, to sidestep an old linker/compiler dead-stripping issue. Here, the symbol (`__RNvCsiLjxBhyzEAX_4curl9INIT_CTOR`; `curl::INIT_CTOR`) is externally visible, we crash when we try to query its address when adding it to the exports trie.
Backtrace excerpt:
```
frame #3: 0x000000019cf14d20 libsystem_c.dylib`__assert_rtn + 284
frame #4: 0x00000001003053e8 ld64.lld`lld::macho::Defined::getVA() const (.cold.2) at Symbols.cpp:97:5 [opt]
* frame #5: 0x0000000100289ee0 ld64.lld`lld::macho::Defined::getVA(this=0x000000014b01ca58) const at Symbols.cpp:97:5 [opt]
frame #6: 0x0000000100255fe4 ld64.lld`lld::macho::TrieBuilder::sortAndBuild(llvm::MutableArrayRef<lld::macho::Symbol const*>, lld::macho::TrieNode*, unsigned long, unsigned long) [inlined] (anonymous namespace)::ExportInfo::ExportInfo(this=<unavailable>, sym=0x000000014b01ca58, imageBase=4294967296) at ExportTrie.cpp:64:21 [opt]
```
```
$ ar x Users/hwennborg/chromium/src/third_party/rust-src/build/x86_64-apple-darwin/stage2-tools/x86_64-apple-darwin/release/deps/libcurl-f5fa2775c6309a20.rlib curl-f5fa2775c6309a20.curl.da8bc63371d5233d-cgu.09.rcgu.o
$ nm curl-f5fa2775c6309a20.curl.da8bc63371d5233d-cgu.09.rcgu.o -s __DATA __mod_init_func
0000000000000d40 S __RNvCsiLjxBhyzEAX_4curl9INIT_CTOR
```
## Fix ideas
1. Completely remove `__mod_init_func` from the list of input sections
I thought my [original patch](https://github.com/llvm/llvm-project/commit/389e0a81a15ca688cf85a82d04aeaa68d18da161) would have this effect: we do not create an OutputSection for it and don't even include it in the global `inputSections` list.
In reality, this is not enough; they are still added to the symbol table (as `__mod_init_func` is present in the `symbols` array, `ObjFile::parseSymbols` will reach it).
(+) if we encounter a `Defined` symbol during the program's execution, we know for sure that it has an address, no need to check for a poison flag.
(-) sounds a bit hackish; currently there is a one-on-one correspondence between `ObjFile::sections` and the input file's contents
2. Create a "poisoned" state for `Defined` Symbols
(+) Least amount of modification for the existing code
(+) There will still be an entry (though poisoned) in the symbol table, so we'll be able to emit useful warnings if someone actually refers to the symbol.
NOTE: this is basically the current workaround I ended up going for, except that I use the `isLive()` mechanism from dead-stripping.
3. Some other idea?
- maybe just add a few extra checks to the known crashy places to see if the symbols refer to a deleted section? sounds very fragile
These mentioned workarounds only work if the symbols are not actually referenced in relocations. If they are, we get different (but equally undesirable) behaviors.
```asm
; test.s
.globl _main
.text
_main:
leaq _init_slot(%rip), %rax
.section __DATA,__mod_init_func,mod_init_funcs
_init_slot:
.quad _main
```
<details>
<summary>ld64 crash</summary>
```
❯ clang test.s -Wl,-ld_classic
ld: warning: alignment (1) of atom '_init_slot' is too small and may result in unaligned pointers
0 0x10659e807 __assert_rtn + 137
1 0x1065a79e3 ld::tool::OutputFile::addressOf(ld::Internal const&, ld::Fixup const*, ld::Atom const**) (.cold.1) + 35
2 0x1063fc5e4 ld::tool::OutputFile::addressOf(ld::Internal const&, ld::Fixup const*, ld::Atom const**) + 116
3 0x1063fd087 ld::tool::OutputFile::applyFixUps(ld::Internal&, unsigned long long, ld::Atom const*, unsigned char*) + 599
4 0x106405818 ___ZN2ld4tool10OutputFile10writeAtomsERNS_8InternalEPh_block_invoke + 504
5 0x7ff815881def _dispatch_client_callout2 + 8
6 0x7ff815893547 _dispatch_apply_invoke3 + 431
7 0x7ff815881dbc _dispatch_client_callout + 8
8 0x7ff81588304e _dispatch_once_callout + 20
9 0x7ff815892740 _dispatch_apply_invoke + 184
10 0x7ff815881dbc _dispatch_client_callout + 8
11 0x7ff8158912ca _dispatch_root_queue_drain + 871
12 0x7ff81589184f _dispatch_worker_thread2 + 152
13 0x7ff815a1fb43 _pthread_wqthread + 262
A linker snapshot was created at:
/tmp/a.out-2024-06-25-091629.ld-snapshot
ld: Assertion failed: (_mode == modeFinalAddress), function finalAddress, file ld.hpp, line 1413.
```
</details>
<details>
<summary>ld_prime broken (?) binary</summary>
```
❯ clang test.s -Wl,-ld_new
❯ objdump -d a.out
a.out: file format mach-o 64-bit x86-64
Disassembly of section __TEXT,__text:
0000000100000f9d <_main>:
100000f9d: 48 8d 05 5c f0 ff ff leaq -4004(%rip), %rax ## 0x100000000
# Relocation refers to the beginning of the file, `__mh_execute_header`???
```
</details>
## Additional questions
There have been other similar transformations added recently: ObjC relative method lists, (etc?). Could we theoretically encounter a similar scenario there?
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzMWltz2zqS_jXMS5dUJEhK1EMeZDmqdW0mmYo9O6f2RQUSTRExCDAAaFvz67caoG6-nDlzZh9G5UQkro2-fH2BuHNyrxE_J-VNUt5-4qPvjP18g9ZzxfXtp9qIw-cb3jzurRm1SPI1PHeoQUn9KPUeet58v4daam4lOniWvoOm41KjgFa-jINL2AaeETSiAG_AW65da2wPUksvuZL_QOvAeWNRgNSQLNLdrjdiR_27dtRNskghYRXXwK3lBzAtDEZqT_MStrZYc4ciYWvwnTXjvgPfIYxu5Ap65NoB92BH7WWPCVsRFWGTsIFpW4fevbNHzma19DANoFmXFCdsNU_S2yRdx_-_GdtzpQ7TcY1Wh9OZG24ReG1GHyizqEzDvTTa0YGpSeph9B8c3WETxs5hDYM1tcIeuJWO2E2i8B3S8vRPOUMccYe-NsoRRwS2QRZSOykQpJ_D3xGa0VrUXh1A7rWxSGuQQODr11uYzcB33IM8ik4YnbClBy5EHEhi7BDiNuB5rZCY56RuwlJwPB-9HKKwaNJxIXyRzgPXh95Yksgc4KGTDoRBF0d0fBhQE0UW96Pi9qxijUXuUUB9gI3iek9EJovUPcvWE7-mdzu60yv6Zk7fdRSARej4E0KNqIFDi8_EHs91g_HIYYgnitQkQMtdhy7J15cin8GdBjIclq-KZbZIytuEVZ33QxjJtgnb7qXvxnremD5hW6Wejl-zwZqf2PiEbaVzI228jYuw1cR2hwgc9ma2R402npmYcCBeRz0arIFWKgTpoLbmkc6jxQUfpW7UKGidetRNR2odGAXOj20LrTV9ENj97X-T3B--335PWBZE8l9IstlcSjphVVBR69Mdt_2i2EnjdkrW88HbwOsVUfJK597Xau6AgzZ6hi-DsXS4uEmS39DhA8cn_bYHQhpvggJK_5H6SVIbB9pAY6xFNxgtaKIZPRnXZEZ0JGfCFjzyyGGwPL2jEfEUV5YNANQ__Z2bWAFKDgZmvpMEG_2igL-5AA1bp3-NCdveonv0ZkjYtums6eXYXz8625ACGHdsxoRtnwa9wxeP2gVytw-jZs40j27eWt7js7GPl60wM3B6uSYvYUXCqtriM8xmg8VWvkDUwVXCtrXUR2XkFl7Oi8DezM31SroPjZAsN7C3OMA7SpCUm6AHp4np8ZPxNueEZO9Nu57zitHnDvJB3vIGAV8atIM_GeOH8gn-ALmAhOWLIHdvBrDIndGQ5Lfw5bfN7mZ9u1tvNl_u70mHGiMwyW8zGs2FsOhckt-mL3lBPLtaOwiDlk7JKaYvx7OmKeNptSpAiUUxV0oki_T-0Hte30cNvOsHleQb6snXSb7uedOZ-Pj1r4siyb_El2crPT6YYOySyEjyzaj5E5eKVJ7GBUxr3-9aQWM0wayH-4P2HXrZ3B9dSTMMSb7OiqxM8nWZEoqZwROA_b4kZhGok_JmM2kx1OP-PeCLqDY_Kvvc2P0l1uWszPJ8sSonzLCoAsJNtk2oukib0SrCih-j84QIHv8pxHKFL42VTeeD7dAKM3IFR1jtZNPR6gKVrAOqqsMEWO7fWHxbK1NTU5qxRZZnaZvlTVWKZpmKAiueYVsty7RYLBjPsrI42T4ZACFG_jVj-exrVhL4Aj-CG-8NQV-IEC5jjw3wYeDRhweQNkBY6zwOwDUYJUKEhpZoMv0gFVoQyMXMeSuHgXAxyOJ3cf7Ht6eNk19_vtx0h398Wf-2K-jIq7tvdw-7zcP3H6Qc-Q0cJRW09qozypbAzGoKjuBJOkkKGj3cBcg_I-E8nePXiPYA0rujAcYBXAQwP8N_9BsOvJX4Bq__ZbSAsz3n1_a8atqsECwFJWt3cB77XTMXBwI84hF3Dq3fWa8hYTfAquLdRYvXIJGnZY7VJUi8Bwi30ZXGlz36_1mTbE62nbBq3hgl5ixoDVl6iPwm-14tybrfNe5LACvfAFi1Qkz_FG0TVJ1XK-o0a3hZvQKkP0bmBQMXb4gsyxaLf0bkg5V4M0ol0MYGZ6xfaxHaElYFdxg6_jKGSGJN4f8PbD9A6Eh5PEnC1hMIf7TzN3InbE1DRh0yLQHKxJj1VcOKzi-1Ciwtb2MyYihEHh1o3qMbeEPBclz9S9D-O92aN--_7y_cof9APhuQPd_jDXfkAgu2KlaLJVstJtWKO9CpJrGRq1qz7A_7jt-Jo0L8cYydumfUuo7-4k2s5DtpxW7g1h8StiXwncWOOgp0-1ItdotixodB4Uxw-xyiHOf5HtnMm5AWfTDIokI6PNsKpLyVwDlgfFu2nC2XZbPI0xVn6dwqWcP7XdQ6F7yqm0WeLzNRsjwXs2Y_ztPV3NK3ecOUGGD96fVgRoHV7fphDa8D7bexGH1EkcI9_AF8_32BJixPWA5b-QJSIHeXfdkcNqYfFAYHa7E3Tx9lAqckRFFWaNopGT6mva-4dQe-oxzfQ38IqmflXmquYOC-6f5kCtaYvpf0kFcrTHmV8axs-KKqmrYqecVEWnDkfFGJrBI8WwQn_WxGJWImGXJFbFtaLV_HnBm08VOySi75e8hCpgAMWmPJlcVkbUqKn0LCG_M16Y-Vgb0yNVfEusCWYwRHjCN-vfZ8d5QxcyV9DAuILukCKaiJbeSvQ1LOLYLzUinyrOfA63VSxd0HUpMOBosO9YlSSsKnwgPld4SjUy7-vf65lYRAhFQDtw7vzwOfiQaLvOmARPAm9woO7yaEEi1xFnVjRu3RAqe1j16I0otIuxitnIKmwZq95X3ClhSFYDMeM8BnhEdtnoMc3Biyfe5hyiC5PsX-bEP55KmO02HzGOZwGIykNKJVfP8OxTOi15lRC0pz67Bw8yhd4P-5-hKLEZLGGI0zo2dG40X6irpBqNE_I-o3jHQXqkCKdK4jtTSGztwY7VH7Kxtic9hMWgkJY_EcKBLGwHlqpvNdM_YorTe4dZTMV-Tk2HsSDFlwb4Rs5VQDovViwCadJ8mEJOvDtR4CT4JWRP2sg_2g9sfKRyjxneleHRXwUndPaX7CltMapNHeAPbSw-iwHRU8c6ul3jtSLWd6JO7zxo8hVrXYonXXhvEm0Pz2_YHStZOl1dzJJkynSZOkgXJ2HgqocAeoyd7GAfaGuNEaG2pULw0OPurhHdF3tCnpvsonjGEfCaPHpuNauj4C53VQf0VfPod70yMYUrMA0Um-vWD56WkGPT_UCD8pz-JCTEUxfPGWR6U_sYGsRsfA_QCD4g2GLodIPDwzykX2haoNCCQ_IE4lmHx7tI0nivVby_eksRekP3ToEHrUNB7FBQNdLK1Sw-sdCdII6a4lSDYkYiXxVHadw117gsEJD_boQcg2zAiRdT16wF9xpVELdNJGzVpBjR1_ksa6K36fnCR3_dRCaIvOzyfbmROaK9j1XOqpxeOLj4-x9ZynKOS_IIKuU8YHDSitHKZUll74y-X-84m_UzCQsM1r3Gabq_eJqos9zrvPf41cXFL6fgiQbwR6LpWj8PLY5Ma-5_aQ5F8oOo_akuQbisNOPe8ybnr9wpLVbbLeQqM4IXngIMz-rhK2mSmxaxR3Tk7RjYqXEtGS6ZErudf9JMTgqU0L3JseEra85OeSDNYbA67n5Ae1IDsAi25UwauNOqyF4uK-IeyZAqQvWbooV1ilS4A3aWCWL6c46DiSL1eYAxwTBYpE41MMDM6wPrme7y1lJ9PoOx0z6GPeEapZx86tfBmHc0Zy0bOmQ5871iHLOKaMWXy7gbycHMNEat42JRb_EaTeQJYtJjA7USfSavmHqBsGddjKl79RLP-GvomyqyzslJt9QNXF8Kbj9oLMcjVVB4uJzCItq6wizdj97zemREF0ZumZxiwN9T3awX358e1-Vx0p-_LXblcr0zzupH4yjxg3SKe6QkkbLNu2ysqqygS2ADshXYh-d42SqP2OHJAZPQszqzhvcTFvlZfF8nJeYNW0XR5mFXkW5y2v96ubj_e73K66nJanBV5OM7rBq0lssvzVJZFsWaQfERl141hrydI_R2SWXe6XsYZfzrPG-N2vEUfcCctlNOxqOfElY1dzq-JKEOSe0O5i_TnKISvZNDM_z-RZWxc5wG6IQ3fPv0416xtgi2nKeqrmgdN8cJ3x8MzPN2HcX1xMrShJ7oeEbfncjH7GUlbM0sWMlbN0lS3Yaq7E7LjMJYKuA4KFgI1LFeo6BBbkQRBCNeGW4jrcUra1PsbHwRWRL4kzr_o28W5KiXk3DMGspEbIiiyfv4_7wUW88Sd_0NHsBit7PF6DkbfMt8FXhzuz_0_3o_H59UhT_xRjP8BMQGD75fKxgSS0Cvxoje25h5433czAogi3zC_VYrYoLqfdSkdOpa9VuI0--_aHL789BN8eQofrK8lzeSxN03YlIMk3U1zx5TT01EsiLiqoBKQllA20KbQt_U2fJF1REJKkq1mRpsX7EchxbCwGEPhNn1ORAH6cIq9XQXWNe6l1uKSLoVzMXDZTztntYtKGuw65wHDFmG-nv39ZgyKBayEkUcIV_BrRvSkxPLy6H47Bs5O9VNyef70wXd_H3NliEyvx-Rq-1z838TZDPlEE6zsjQqbuIssq9E3UzDlsQgXhOYT6xqKfMofLDPe4sWtQcytNTBmTfPtJfM7FKl_xT_g5W2arRVGVRfGp-5ynbV1jWdbNoljWyzRvlqIq07TKeMmKuvgkPxMmpAu2Sqt0lRZzVhX1Il8VJRYsq8ssKVLsuVRzpZ7Cxc2ncFXwebXMyvKT4jUqF34wwhilcjOKkUPBgSVskzAWq6F_4U33ndrK20_2cyi41OPeJUUauHFe3Euvwg9QlBKzWEAtb48558e_COFBTFooFDG5pqxA-o6Gz179tOPTaNXnf-NiPhz86TP7vwAAAP__B9vrNg">