<div dir="ltr">It is very important to resist features that add needless complexity :)<div><br></div><div>I will reply to the (great) points you have raised tomorrow. The hot-patching feature is proprietary technology and I need to check how much I can disclose about it - sorry! I will also put up a prototype implementation so that the complexity of the implementation can be judged. I have not attempted to describe all GOT/PLTs only the ones that are structured "normally". x86-64's IBT PLT would need an extension to the binary format to describe. I am not convinced we need to describe every variation to add value. If the binary format can describe the commonly used GOT/PLT structures then I believe that is sufficient. We can design the binary format to be flexible so that it can be extended in the future if support for a GOT/PLT structure that cannot be described currently is required.</div><div><br></div><div>Do you have an opinion on the other sections? In particular the linkmap section? That section is the most important information for our hot-patching implementation and it also has clear benefits over the current -Map file option.</div><div><br></div><div>Thanks.<br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Sep 24, 2021 at 12:40 AM Fangrui Song <<a href="mailto:maskray@google.com">maskray@google.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>

On 2021-09-22, bd1976 llvm wrote:<br>

>  Thanks for looking at this proposal.<br>

><br>

>On Tue, Sep 21, 2021 at 3:43 AM Fangrui Song <<a href="mailto:maskray@google.com" target="_blank">maskray@google.com</a>> wrote:<br>

><br>

>> On 2021-09-21, bd1976 llvm via llvm-dev wrote:<br>

>> >As mentioned Sony would like LLD to optionally emit sections that describe<br>

>> >the GOT and PLT.<br>

>> ><br>

>> >The proposed binary format of these sections is as follows:<br>

>> ><br>

>> >.debug_lld_got<br>

>> >==============<br>

>> ><br>

>> >The .debug_lld_got section contains a GOT description. The GOT description<br>

>> >begins with a header composed of the following fields:<br>

>> ><br>

>> >length (uleb)<br>

>> >- The length in bytes of the GOT description not including the length<br>

>> field<br>

>> >itself.<br>

>> >- This allows for padding to be added to the section, useful for purposes<br>

>> >such as slop for incremental linking.<br>

>><br>

>> I am dubivious whether people will find incremental linking useful:)<br>

>> <a href="https://news.ycombinator.com/item?id=26233244" rel="noreferrer" target="_blank">https://news.ycombinator.com/item?id=26233244</a> from Rui Ueyama<br>

>> and<br>

>> <a href="https://sourceware.org/pipermail/binutils/2021-September/117828.html" rel="noreferrer" target="_blank">https://sourceware.org/pipermail/binutils/2021-September/117828.html</a><br>

>> from Cary Coutant:<br>

>> "Do you think you'd ever want incremental linking on powerpc? Frankly,<br>

>> the effort for just the one target platform was pretty high, the<br>

>> maintenance on it is burdensome, and I'm tempted to deprecate it and<br>

>> rip it out at some point in the future."<br>

>><br>

><br>

>I generally tend to agree w.r.t incremental linking. However, supporting<br>

>the ability to include extra space in a section could have many uses and<br>

>therefore I think that it is something that section formats should support<br>

>as long as it is cheap to do so. Having said that we don't actually have a<br>

>need right now for this so I'm happy to drop it from the specification.<br>

><br>

><br>

>><br>

>> >- The value cannot exceed Elf_Off.<br>

>> ><br>

>> >version (uleb)<br>

>> >- The version of the description information.<br>

>> >- Currently, 0.<br>

>> >- The value cannot exceed Elf_Word.<br>

>> ><br>

>> >The header is then followed by list of entry descriptions.<br>

>> >Each entry description describes the GOT entry with the same index.<br>

>> >Each entry description starts with three ulebs:<br>

>> ><br>

>> >- The first uleb gives the number of ulebs used by this description (so<br>

>> >that the description can be skipped if the category isn't understood). The<br>

>> >value cannot exceed Elf_Word.<br>

>> >- The second uleb gives the number of GOT slots* used by this GOT entry.<br>

>> >The value cannot exceed Elf_Word.<br>

>> >- The third uleb encodes the category of the GOT entry. The value cannot<br>

>> >exceed Elf_Word.<br>

>> ><br>

>> >* Except for GOT_CAT_PADDING entries where this field gives the number of<br>

>> >bytes of padding (the value cannot exceed Elf_Off) not the number of GOT<br>

>> >slots.<br>

>> ><br>

>> >A category encoding can specify multiple associated arguments. Argument<br>

>> >interpretation is specified by the encoding. If an encoding requires<br>

>> >arguments, the bytes for those follow the bytes for the second uleb in the<br>

>> >entry description.<br>

>> ><br>

>> >Categories are:<br>

>> ><br>

>> >Encoding                             Argument *      Size (slots)<br>

>> > Notes<br>

>> >GOT_CAT_UNKNOWN                      none            1<br>

>> >Unknown area of the GOT.<br>

>> >GOT_CAT_PADDING                      none            <variable><br>

>> > Padding between GOT regions.<br>

>> ><br>

>> The<br>

>> >size field gives the padding size in bytes not the number of GOT slots.<br>

>> >GOT_CAT_GOTPLT_HEADER                none            <target dependent><br>

>> The<br>

>> >.got.plt header. x86_64 size = 3 slots.<br>

>> >GOT_CAT_GOT                          symbol index    1<br>

>> >Normal entry for a symbol.<br>

>> >GOT_CAT_PLTGOT                       symbol index    1<br>

>> >.got.plt Entry for a PLT reference to a symbol.<br>

>> >GOT_CAT_IGOTPLT                      symbol index    1<br>

>> >.igot.plt entry for an ifunc.<br>

>> >GOT_CAT_IGOTCANONICAL                symbol index    1<br>

>> GOT<br>

>> >entry for canonical PLT entry for non-preemptible ifunc case.<br>

>> >GOT_CAT_TLSDESC                      symbol index    2<br>

>> GOT<br>

>> >entry for a TLSDESC slot.<br>

>> >GOT_CAT_TLS_GD                       symbol index    2<br>

>> GOT<br>

>> >entry for a GD TLS reference.<br>

>> >GOT_CAT_TLS_LD                       none            2<br>

>> GOT<br>

>> >entry for tls_index structure for an LD TLS reference.<br>

>> >GOT_CAT_TLS_IE                       symbol index    1<br>

>> GOT<br>

>> >entry for a IE TLS reference.<br>

>> >GOT_CAT_PPC64_V2_ABI_TLSLD_GOT_OFF   symbol index    1<br>

>> >PPC64 specific TLSLD GOT slot.<br>

>> ><br>

>> >.debug_lld_plt<br>

>> >==============<br>

>> ><br>

>> >The .debug_lld_plt section contains a PLT description. A PLT description<br>

>> >begins with a generic header composed of the following 3 ulebs:<br>

>> ><br>

>> >length (uleb)<br>

>> >- The length in bytes of this PLT description not including the length<br>

>> >field itself.<br>

>> >- This allows for padding to be added to the section, useful for purposes<br>

>> >such as slop for incremental linking.<br>

>> >- The value cannot exceed Elf_Off.<br>

>> ><br>

>> >version (uleb)<br>

>> >- The version of this description information. Currently, 0. The value<br>

>> >cannot exceed Elf_Word.<br>

>> ><br>

>> >type (uleb)<br>

>> >- The type of the PLT being described.<br>

>> >- This affects the interpretation of the remaining description.<br>

>> >- Currently, only PLT_FIXSZ_ENT(value = 0) is defined for describing PLT<br>

>> >sections composed of a header and N fixed size entries.<br>

>> >- The value cannot exceed Elf_Word; although, currently as there is only<br>

>> >one value specified a smaller representation is sufficient.<br>

>> ><br>

>> >PLT_FIXSZ_ENT interpretation<br>

>> >Following the generic header is the PLT_FIXSZ_ENT description header which<br>

>> >is composed of the following 2 ulebs:<br>

>> ><br>

>> >PLT header size (uleb)<br>

>> >- The size of the PLT header in bytes.<br>

>> >- The value cannot exceed Elf_Off.<br>

>><br>

>> >PLT entry size (uleb)<br>

>> >- The size of a PLT entry.<br>

>> >- The value cannot exceed Elf_Word.<br>

>><br>

>> The PLT header size and PLT entry size are hard coded depending on the<br>

>> architecture and a few security related options like -z retpolineplt,<br>

>> ibt, bti. Is a generic description scheme useful?<br>

>><br>

><br>

>It's useful because the description is emitted by the linker rather than<br>

>requiring the consuming tools to be adapted to the linker's output. For<br>

>example, llvm-objdump can generate <symbol>@plt labels for PLT entries when<br>

>disassembling but this doesn't work if  -z retpolineplt is used as the code<br>

>doesn't support that newer type of PLT (<br>

><a href="https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp#L494" rel="noreferrer" target="_blank">https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp#L494</a><br>

>).<br>

<br>

I am concerned that this would add a significant complexity to LLD.<br>

<br>

Except canonical PLT entries (normal function and STT_GNU_IFUNC<br>

converted STT_FUNC), PLT entries have insignificant addresses and the<br>

linker can generate multiple instances.<br>

For example, the PowerPC64 port PLT is coupled with range extension<br>

thunks and there can be multiple instances.<br>

Each architecture's PLT may have a different shape.<br>

I am not sure how a generic format can describe a stub.<br>

Some architectures can do micro optimization like: if we know the hi<br>

part of a pair of hi/lo values is zero, we may save one instruction.<br>

Such choice is easy to represent in code but difficult to describe<br>

in a serialized format.<br>

<br>

AArch64's BTI PLT is also interesting: some PLT entries may have a<br>

leading `bti c` while some don't.<br>

<br>

x86-64's IBT PLT is worse: there are two sections: .plt and .plt.sec .<br>

How to describe it?<br>

(Multiple folks were against .plt.sec ; I subscribed to x86-64-abi after<br>

this event in case I missed such over-engineering designs in the<br>

future.)<br>

<br>

Describing PLT/GOT gives me a sense like support GNU ld --verbose style<br>

linker script dump (<a href="https://bugs.llvm.org/show_bug.cgi?id=51309" rel="noreferrer" target="_blank">https://bugs.llvm.org/show_bug.cgi?id=51309</a>).<br>

Yes, it can make some applications happy but the implementation complexity<br>

would be huge.<br>

<br>

Perhaps something I really want to ask is whether we ran into an XY<br>

problem (<a href="https://xyproblem.info/" rel="noreferrer" target="_blank">https://xyproblem.info/</a>). What did the hot-patching feature<br>

actually need? FWIW such a feature is also implemented in the Linux<br>

kernel, called live-patching, which is related to dynamic ftrace.<br>

So far we haven't heard that they need anything from the linker side.<br>

<br>

Well, a GNU contributor added -z unique-symbol very quickly while the<br>

needs appear to have disappeared :)<br>

<a href="https://bugs.llvm.org/show_bug.cgi?id=50745" rel="noreferrer" target="_blank">https://bugs.llvm.org/show_bug.cgi?id=50745</a><br>

I am sold that this option is misdesigned :)<br>

(<a href="https://maskray.me/blog/2020-11-15-explain-gnu-linker-options#z-unique-symbol" rel="noreferrer" target="_blank">https://maskray.me/blog/2020-11-15-explain-gnu-linker-options#z-unique-symbol</a>)<br>

<br>

><br>

>> If the new format is to describe dynamic relocations in a compact way, I<br>

>> am wondering whether this has over-engineered and can achieve the design<br>

>> goal.<br>

>> A program doesn't typically have many GLOB_DAT, TLSDESC, and TLS GD/LD/IE<br>

>> relocations.<br>

>><br>

><br>

>The purpose is to describe the GOT/PLT in a consistent and simple manner<br>

>for consuming tools. Over the years there have been a number of changes to<br>

>how the GOT is optimised. GOT entries can be patched statically, patched<br>

>with relocations that don't reference dynamic symbols, or patched with<br>

>relocations that reference a dynamic symbol etc.. using this section allows<br>

>each GOT entry to be consistently described. If we can design a more<br>

>compact format for the same information that would be great.<br>

<br>

Does --emit-relocs help here?<br>

<br>

>> MIPS folks invented DT_MIPS_LOCAL_GOTNO and<br>

>> DT_MIPS_SYMTABNO-DT_MIPS_GOTSYM, but the scheme rarely saves much space<br>

>> and turns out to cause more problems with .gnu.hash<br>

>> <a href="https://sourceware.org/pipermail/binutils/2019-December/109330.html" rel="noreferrer" target="_blank">https://sourceware.org/pipermail/binutils/2019-December/109330.html</a><br>

><br>

><br>

>The .debug_lld_got section doesn't currently handle the MIPS GOT as it is<br>

>much more complicated than other GOTs and there already seemed to be code<br>

>in place to be able to parse and dump it.<br>

><br>

>>The header is then followed by list of entry descriptions.<br>

>> >- Each entry description is a single uleb and describes the PLT entry with<br>

>> >the same index.<br>

>> >- The value of the uleb gives the index of the associated GOT entry.<br>

>> >- The value cannot exceed Elf_Off.<br>

>><br>

>> Is disassembling .plt more convenient? The linker uses a predictable way<br>

>> to generate it so its content is not that hard to parse.<br>

>> It can be quick because the shape of a PLT entry is known and many bytes<br>

>> can be skipped.<br>

>> With this in mind, this information is just easy to infer from<br>

>> R_*_JUMP_SLOT relocations.<br>

>><br>

>> >In addition to allowing hot-patching tools to work with the GOT and PLT<br>

>> the<br>

>> >information in these sections is of use to any tool that needs to display<br>

>> >information on the GOT and PLT sections. For example, debuggers and binary<br>

>> >tools synthesize labels of the form <symbol>@plt to label the PLT<br>

>> sections.<br>

>> >The information in these sections could be used to simplify such tasks.<br>

>><br>

>> How is this format more suitable than existing Elf64_Rel/Elf64_Rela for<br>

>> hot-patching? The GOT and PLT information can be inferred from .rela.plt<br>

>> and .rela.dyn easily. The scheme appears to be more complex than the<br>

>> relocation format.<br>

>><br>

><br>

>It's a scheme that describes the GOT and PLT without the consumer needing<br>

>knowledge of other aspects of the dynamic file format such as the dynamic<br>

>relocations and symbols. Referencing static symbols directly avoids any<br>

>ambiguity as to which references caused a GOT entry to be created (matching<br>

>by address may find multiple aliases).<br>

><br>

><br>

>> >On Wed, Sep 15, 2021 at 3:51 AM bd1976 llvm <<a href="mailto:bd1976llvm@gmail.com" target="_blank">bd1976llvm@gmail.com</a>> wrote:<br>

>> ><br>

>> >> Hi All,<br>

>> >><br>

>> >> Sony maintains a downstream patchset to optionally emit additional<br>

>> >> informational sections to the ELF output file created by LLD. These<br>

>> >> sections describe LLD's output and the transformations applied during<br>

>> >> Linking. These additional sections are used with the static symbol<br>

>> >> table (.symtab) to facilitate the operation of hot-patching tools.<br>

>> >><br>

>> >> Our preferences are that:<br>

>> >><br>

>> >> - The information required for hot-patching is stored in the ELF<br>

>> >>   output file as ELF sections, as opposed to being emitted into<br>

>> >>   auxiliary files. Otherwise, customers have to adjust their processes<br>

>> >>   to keep the ELF output file and auxiliary files together when<br>

>> >>   packing/moving the ELF output file and ensure they are correctly<br>

>> >>   matched.<br>

>> >><br>

>> >> - These metadata sections are created by LLD, rather than derived via<br>

>> >>   a post-link procedure. Performance is important, as customers want<br>

>> >>   to be able to enable the emission of hot-patching metadata by<br>

>> >>   default, and having LLD directly emit the required sections is more<br>

>> >>   efficient and a simpler work-flow.<br>

>> >><br>

>> >> The contents of these sections could be seen as debugging information<br>

>> >> for the linking process. Certainly, we would want to handle these<br>

>> >> sections with the same rules that apply to debugging sections when<br>

>> >> manipulating a linked ELF with binary utility tools. For that reason<br>

>> >> the sections are all named .debug_lld_* e.g. .debug_lld_linkmap.<br>

>> >><br>

>> >> Currently, Sony would like to emit the following sections and we<br>

>> >> believe that they are generally useful:<br>

>> >><br>

>> >> - A linkmap section that contains a subset of the information contained<br>

>> >>   in a linker -Map file. This section specifies the linked address for<br>

>> >>   each input section.<br>

>> >><br>

>> >> - A section which specifies the list of wrapped symbols.<br>

>> >><br>

>> >> - A section that describes the GOT. This provides:<br>

>> >> -- A category for each entry, examples: GOT entry, PLTGOT entry, TLS GD<br>

>> >>    entry, LD TLS tls_index structure entry etc..<br>

>> >> -- A slot index at which the entry starts.<br>

>> >> -- A size for the entry, as GOT entries may take more than one GOT<br>

>> >>    slot (e.g. a TLS GD entry takes two slots).<br>

>> >> -- An optional static symbol index to which the GOT entry is associated<br>

>> >>    (some entries e.g. the LD TLS tls_index structure are not associated<br>

>> >>    with a particular symbol).<br>

>> >><br>

>> >> - A section describing the PLT. This section needs to be somewhat<br>

>> >>   flexible to deal with the many different PLT's that exist on ELF<br>

>> >>   toolchains. However, for a fixed size entry PLT description the<br>

>> section<br>

>> >>   will supply:<br>

>> >> -- Which range of bytes comprises the PLT header.<br>

>> >> -- The size of a PLT entry.<br>

>> >> -- For each PLT entry, the GOT slot index of the associated GOT entry.<br>

>> >>    Combined with the information on GOT entries from the GOT description<br>

>> >>    section this allows for the association of a PLT entry with a symbol.<br>

>> >><br>

>> >> Similar to DWARF sections these are non-alloc sections. They are encoded<br>

>> >> as sequences of ULEB128 values. As these are debugging sections, not<br>

>> core<br>

>> >> ELF sections, a compact representation is justifiable, even if the<br>

>> encoding<br>

>> >> is more complex.<br>

>> >><br>

>> >> In order to anchor this discussion I have created<br>

>> >> <a href="https://reviews.llvm.org/D109804" rel="noreferrer" target="_blank">https://reviews.llvm.org/D109804</a><br>

>> >> which contains a prototype implementation of the linkmap section<br>

>> referenced<br>

>> >> above.<br>

>> >><br>

>> >> I would like to ascertain whether the LLVM community would be<br>

>> >> supportive of adding the ability to generate such sections to LLD?<br>

>> >><br>

>> >> Thanks.<br>

>> >><br>

>><br>

>> >_______________________________________________<br>

>> >LLVM Developers mailing list<br>

>> ><a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

>> ><a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

>><br>

>><br>

</blockquote></div>