<div dir="ltr"><div dir="ltr">On Mon, Feb 22, 2021 at 8:08 AM <<a href="mailto:paul.robinson@sony.com">paul.robinson@sony.com</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div lang="EN-US" style="overflow-wrap: break-word;">
<div class="gmail-m_-4316656911760013974WordSection1">
<p class="MsoNormal">What this proposes is really at the very edge of my understanding of ELF sections, but I have a side project that makes me think the “drop the rule entirely (D96914)” part will be a problem for people.<u></u><u></u></p>
<p class="MsoNormal"><u></u> <u></u></p>
<p class="MsoNormal">My side project is to enhance the googletest infrastructure to detect un-executed test assertions. When using Clang as the build compiler, my tactics depend on __start/__stop references to C identifier name sections retaining everything
in those sections. The data allocated to the section does not define any globals so there are no other GC roots. (I could *<b>almost</b>* get the same tactic to work with GCC as the build compiler, but there is one GCC quirk related to inline functions that
got in the way, so I do something more complicated and ugly there.)<u></u><u></u></p>
<p class="MsoNormal"><u></u> <u></u></p>
<p class="MsoNormal">In researching how to make this work, it appears that depending on this behavior of __start/__stop is a not-uncommon tactic; it is fairly well known to work with GNU linkers and LLD. In effect you can allocate static data elements to the
section at arbitrary points, and the __start/__stop symbols let you treat the entire thing as an array. It is impractical to generate unique global symbols for the data elements, and even if you do, it is not possible to generate references to those global
symbols from elsewhere. And in general, you do *<b>not</b>* want the static elements to be GC’d; it defeats the purpose of allocating them in the first place. There’s no use of SHF_LINK_ORDER or SHF_GROUP here; these are normal static variables allocated
to a custom section. In my case, I can’t depend on the order of elements anyway, and macro invocations can’t tell whether they’re invoked inside templates so I can’t use SHF_GROUP either. I end up sorting and deduplicating data manually when it’s time to
look at everything.<u></u><u></u></p>
<p class="MsoNormal"><u></u> <u></u></p>
<p class="MsoNormal">I see the idea for adding a new Clang attribute to “retain” something, but mainly what that does is create work for anyone depending on the historical behavior; we have to conditionalize the set of attributes based on whether Clang understands
“retain” and then cross our fingers hoping we don’t end up in a situation with a pre-retain Clang and a post-retain LLD, because that will break everything.<u></u><u></u></p>
<p class="MsoNormal"><u></u> <u></u></p>
<p class="MsoNormal">I hope this is clear enough, let me know if my explanation doesn’t make any sense.<u></u><u></u></p>
<p class="MsoNormal">Thanks,<u></u><u></u></p>
<p class="MsoNormal">--paulr</p></div></div></blockquote><div><br></div><div>On <a href="https://reviews.llvm.org/D96838#2585171">https://reviews.llvm.org/D96838#2585171</a><br></div><div><br></div><div><p style="margin:0px 0px 12px;padding:0px;border:0px;color:rgb(0,0,0);font-family:"Segoe UI","Segoe UI Emoji","Segoe UI Symbol",Lato,"Helvetica Neue",Helvetica,Arial,sans-serif;font-size:13px">> Aha; attribute <tt class="gmail-remarkup-monospaced" style="background:rgba(71,87,120,0.1);padding:1px 4px;border-radius:3px;white-space:pre-wrap;line-break:anywhere;margin-top:0px">used</tt> *by itself* is not sufficient to preserve sections in the output. But the <tt class="gmail-remarkup-monospaced" style="background:rgba(71,87,120,0.1);padding:1px 4px;border-radius:3px;white-space:pre-wrap;line-break:anywhere">__start_/__stop_</tt> symbols implicitly create a reference to each of the named sections, and that implicit reference can preserve them in the output (assuming gc roots etc). So, the idea is that attribute <tt class="gmail-remarkup-monospaced" style="background:rgba(71,87,120,0.1);padding:1px 4px;border-radius:3px;white-space:pre-wrap;line-break:anywhere">retain</tt> can be used *instead* of the <tt class="gmail-remarkup-monospaced" style="background:rgba(71,87,120,0.1);padding:1px 4px;border-radius:3px;white-space:pre-wrap;line-break:anywhere">__start_/__stop_</tt> symbols, to preserve sections in the output (with the advantage that it will work even for sections that do not have a C-identifier name).</p><p style="margin:0px 0px 12px;padding:0px;border:0px;color:rgb(0,0,0);font-family:"Segoe UI","Segoe UI Emoji","Segoe UI Symbol",Lato,"Helvetica Neue",Helvetica,Arial,sans-serif;font-size:13px">> Thanks for helping me understand this from a user perspective. That will be important when you go to write the release note for this new attribute.</p></div><div><br></div><div>I dug up the history a bit. gold had this behavior in 2010.</div><div>GNU ld got a workaround in 2010 partly because glibc refused to fix the issue (facepalm) <a href="https://sourceware.org/bugzilla/show_bug.cgi?id=3400">https://sourceware.org/bugzilla/show_bug.cgi?id=3400</a> </div><div>Before 2015, the GNU ld behavior only applied to sections in the same .o of the __start_/__stop_ references. In 2015-10, the behavior finally applied to other .o files.<br></div><div><br></div><div>2015-10 is relatively new, so I don't think there are many applications depending on the behavior.</div><div>But there are indeed some applications.</div><div><br></div><div>I submitted a GNU ld patch for -z start-stop-gc which has been accepted by Alan Modra (<a href="https://sourceware.org/bugzilla/show_bug.cgi?id=27451">https://sourceware.org/bugzilla/show_bug.cgi?id=27451</a>).</div><div>I think at some point (14.0.0?) we can still switch the default.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div lang="EN-US" style="overflow-wrap: break-word;"><div class="gmail-m_-4316656911760013974WordSection1"><p class="MsoNormal"><u></u><u></u></p>
<p class="MsoNormal"><u></u> <u></u></p>
<div style="border-top:none;border-right:none;border-bottom:none;border-left:1.5pt solid blue;padding:0in 0in 0in 4pt">
<div>
<div style="border-right:none;border-bottom:none;border-left:none;border-top:1pt solid rgb(225,225,225);padding:3pt 0in 0in">
<p class="MsoNormal"><b>From:</b> llvm-dev <<a href="mailto:llvm-dev-bounces@lists.llvm.org" target="_blank">llvm-dev-bounces@lists.llvm.org</a>> <b>On Behalf Of
</b>James Henderson via llvm-dev<br>
<b>Sent:</b> Monday, February 22, 2021 5:35 AM<br>
<b>To:</b> Fangrui Song <<a href="mailto:maskray@google.com" target="_blank">maskray@google.com</a>><br>
<b>Cc:</b> LLVM Dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>><br>
<b>Subject:</b> Re: [llvm-dev] ld.lld "Don't let __start_/__stop_ retain C identifier name sections" && Swift<u></u><u></u></p>
</div>
</div>
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<p class="MsoNormal">I've filed an internal issue tracker for us to investigate the impact of this proposal, although I don't know when we'll get a chance to schedule the work at this point. Also, it's worth noting that we can't test all downstream codebases
that potentially could use this feature, so we'll likely want some potential source code or switch that will allow our users to keep their unused sections.<u></u><u></u></p>
</div>
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<div>
<p class="MsoNormal">On Sat, 20 Feb 2021 at 00:01, Fangrui Song via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:<u></u><u></u></p>
</div>
<blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt solid rgb(204,204,204);padding:0in 0in 0in 6pt;margin-left:4.8pt;margin-right:0in">
<p class="MsoNormal"><br>
tl;dr With --gc-sections, I think the rule "__start_foo/__stop_foo references from live sections retains all non-SHF_LINK_ORDER input sections foo" does not cary its weight, so I'd like to drop it entirely in
<a href="https://urldefense.com/v3/__https:/reviews.llvm.org/D96914__;!!JmoZiZGBv3RvKRSx!pAsf5XQa2GgpqsyGaZGjOEeqTeliL8ikbNK-wdQ1KCaKgZDaTWVo-EyJNUPFMdsL_g$" target="_blank">
https://reviews.llvm.org/D96914</a><br>
<br>
I have done a large-scale internal test with huge amount of OSS usage and spotted two issues:<br>
<br>
(1) Linking systemd. <a href="https://urldefense.com/v3/__https:/github.com/systemd/systemd/blob/main/src/libsystemd/sd-bus/bus-error.h*L33__;Iw!!JmoZiZGBv3RvKRSx!pAsf5XQa2GgpqsyGaZGjOEeqTeliL8ikbNK-wdQ1KCaKgZDaTWVo-EyJNUNdvKsIKw$" target="_blank">
https://github.com/systemd/systemd/blob/main/src/libsystemd/sd-bus/bus-error.h#L33</a> there will be an `undefined symbol: __start_SYSTEMD_BUS_ERROR_MAP` error. Supposedly it can be trivially fixed by using undefined weak symbols on __start_/__stop_.<br>
(2) Linking Swift. There will be errors like `undefined hidden symbol: __start_swift5_protocols`.<br>
<a href="https://urldefense.com/v3/__https:/github.com/apple/swift/blob/main/stdlib/public/runtime/SwiftRT-ELF.cpp__;!!JmoZiZGBv3RvKRSx!pAsf5XQa2GgpqsyGaZGjOEeqTeliL8ikbNK-wdQ1KCaKgZDaTWVo-EyJNUPUQLE7WQ$" target="_blank">
https://github.com/apple/swift/blob/main/stdlib/public/runtime/SwiftRT-ELF.cpp</a><br>
It seems that trivially making `extern const char __start_##name` does not work.<br>
The code relies on some `swift5_*` input sections being GC root.<br>
(If someone can file an issue to Swift, I'd appreciate that.)<br>
(If Swift folks can fix it, I'll give my big thanks:)<br>
<br>
This can still potentially break some propritary code so I am sending this heads-up.<br>
I'll place rationale below (it is complicated).<br>
<br>
<br>
<br>
The current rule is:<br>
<br>
__start_/__stop_ references retains all non-SHF_LINK_ORDER C identifier name sections.<br>
<br>
After <a href="https://urldefense.com/v3/__https:/reviews.llvm.org/D96753__;!!JmoZiZGBv3RvKRSx!pAsf5XQa2GgpqsyGaZGjOEeqTeliL8ikbNK-wdQ1KCaKgZDaTWVo-EyJNUOp7Zgkwg$" target="_blank">
https://reviews.llvm.org/D96753</a> , it will become<br>
<br>
__start_/__stop_ references retains all non-SHF_LINK_ORDER non-SHF_GROUP C identifier name sections.<br>
<br>
(The section group special case is to allow garbage collecting __llvm_prf_* sections for -fprofile-generate/-fprofile-instr-generate. The saving is huge.)<br>
<br>
Personally I'd drop the rule entirely (D96914) (get support from jhenderson and phosek), i.e.<br>
<br>
__start_/__stop_ references do not retain C identifier name sections.<br>
<br>
and hope folks can fix Swift/systemd to not rely on the original rule.<br>
<br>
---<br>
<br>
I have placed more details in <a href="https://urldefense.com/v3/__https:/maskray.me/blog/2021-01-31-metadata-sections-comdat-and-shf-link-order__;!!JmoZiZGBv3RvKRSx!pAsf5XQa2GgpqsyGaZGjOEeqTeliL8ikbNK-wdQ1KCaKgZDaTWVo-EyJNUPBUkT5Qg$" target="_blank">
https://maskray.me/blog/2021-01-31-metadata-sections-comdat-and-shf-link-order</a><br>
discussing why the rule gets in the away and why SHF_LINK_ORDER is not a solution.<br>
(Section groups have size overhead for single metadata section.)<br>
<br>
I'll paste the relevant paragraph here for your convenience.<br>
(I may edit my article to make it clear)<br>
<br>
This is a common usage of metadata sections: each text section references a metadata section.<br>
The metadata sections have a C identifier name to allow the runtime to collect them via `__start_`/`__stop_` symbols.<br>
<br>
Since `__start_`/`__stop_` references are always present from live sections, the C identifier name sections appear like GC roots, which means they cannot be discarded by `ld --gc-sections`.<br>
<br>
For users who want to keep GC for these metadata sections, they can set the `SHF_LINK_ORDER` flag or make the metadata section a member of a section group.<br>
(GNU ld does not implement the `SHF_LINK_ORDER` rule yet. <<a href="https://urldefense.com/v3/__https:/sourceware.org/bugzilla/show_bug.cgi?id=27259__;!!JmoZiZGBv3RvKRSx!pAsf5XQa2GgpqsyGaZGjOEeqTeliL8ikbNK-wdQ1KCaKgZDaTWVo-EyJNUNYIjk40Q$" target="_blank">https://sourceware.org/bugzilla/show_bug.cgi?id=27259</a>>)<br>
(In LLD, some folks have concluded that this rule does not cary its weight, so possibly it would be nice it we can drop it ([D96914](<a href="https://urldefense.com/v3/__https:/reviews.llvm.org/D96914))__;!!JmoZiZGBv3RvKRSx!pAsf5XQa2GgpqsyGaZGjOEeqTeliL8ikbNK-wdQ1KCaKgZDaTWVo-EyJNUMtgSFzag$" target="_blank">https://reviews.llvm.org/D96914))</a>.)<br>
<br>
Now, let's walk through an `SHF_LINK_ORDER` example when inlining can cause problems.<br>
<br>
```asm<br>
# Monolithic meta.<br>
.globl _start<br>
_start:<br>
leaq __start_meta(%rip), %rdi<br>
leaq __stop_meta(%rip), %rsi<br>
<br>
.section .text.foo,"ax",@progbits<br>
.globl foo<br>
foo:<br>
leaq .Lmeta.foo(%rip), %rax<br>
ret<br>
<br>
.section .text.bar,"ax",@progbits<br>
.globl bar<br>
bar:<br>
call foo<br>
leaq .Lmeta.bar(%rip), %rax<br>
ret<br>
<br>
.section meta,"a",@progbits<br>
.Lmeta.foo:<br>
.byte 0<br>
.Lmeta.bar:<br>
.byte 1<br>
```<br>
<br>
The monolithic `meta` does not enable precise garbage collection.<br>
It may be attempting to make `meta` separate and add the `SHF_LINK_ORDER` flag (to defeat the "C identifier name sections appear like GC roots" rule):<br>
<br>
```asm<br>
.section meta,"ao",@progbits,foo<br>
.Lmeta.foo:<br>
.byte 0<br>
<br>
.section meta,"ao",@progbits,bar<br>
.Lmeta.bar:<br>
.byte 1<br>
```<br>
<br>
However, due to inlining (foo into bar), the `meta` for `.text.foo` may now get a reference from another text section `.text.bar`, breaking an implicit assumption of `SHF_LINK_ORDER`: such a section can only be referenced from its linked-to section.<br>
```asm<br>
# Both .text.foo and .text.bar reference meta.<br>
.section .text.foo,"ax",@progbits<br>
.globl foo<br>
foo:<br>
leaq .Lmeta.foo(%rip), %rax<br>
ret<br>
<br>
.section .text.bar,"ax",@progbits<br>
.globl bar<br>
bar:<br>
leaq .Lmeta.foo(%rip), %rax<br>
leaq .Lmeta.bar(%rip), %rax<br>
ret<br>
```<br>
<br>
If `.text.bar` (caller) is retained while `.text.foo` (callee) is discarded, the `meta` for `foo` will link to a discarded section: an invalid state.<br>
LLD has an error: `{{.*}}:(meta): sh_link points to discarded section {{.*}}:(.text.foo)`.<br>
_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
<a href="https://urldefense.com/v3/__https:/lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev__;!!JmoZiZGBv3RvKRSx!pAsf5XQa2GgpqsyGaZGjOEeqTeliL8ikbNK-wdQ1KCaKgZDaTWVo-EyJNUNDA2XUCA$" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><u></u><u></u></p>
</blockquote>
</div>
</div>
</div>
</div>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr">宋方睿</div></div></div>