[llvm] [AArch64, ELF] Allow implicit $d/$x at section beginning (PR #99718)

Mon Aug 12 22:51:48 PDT 2024

MaskRay wrote:

> Some thoughts after thinking about this on the weekend and talking to some colleagues.
> 
> Apologies for the length of the response, a brief summary:
> 
> * I think this could be a useful option if projects build all their objects with it enabled. Would be good to know from potential users that they would enable this option in this review? So far I think it is only me that has commented.
> * My main concern is binary distribution of objects with it enabled, and what future choices this might rule out.
> * I'm still nervous about trailing mapping symbols, particularly with `-fsanitize=undefined` or `-fsanitize=kcfi`.
> * Can we alleviate these concerns by marking object files that use this mapping symbol convention. That way we could fix them up later if needed.
> * Is there an alternative way to do link-time culling of mapping symbols that runs faster?
> 
> In more detail.
> 
> The pragmatist in me says that this is a useful (possibly llvm only [1]) option for people with large builds to enable an alternative mapping symbol definition than the ABI and accept the consequences of that as well as the benefits. The idealist in me says that if this propagates out into binary distributions across multiple projects then we may have incompatibilities with GNU and while there are no plans to add one, it may rule out something like a second instruction set (like Morello or Thumb [2]). [1] the comments on the binutils mailing list don't look enthusiastic. [2] We can't know which instruction set is implicit.

Thanks for the comprehensive summary. I concur with your analysis, including the subsequent points. I believe some users, like mine, are willing to accept the consequences.

> A colleague working on Bolt mentioned that there could be a case for more mapping symbols to help binary analysis, this implies that there could be more mapping symbol density options than just optimize and default. He also mentioned that recording the mapping symbol level choice in the ELF file (mechanism TBD) might be a way around the future compatibility problem, and could extend to the additional mapping symbol cases. Does recording the use of the alternative mapping symbol mechanism appeal at all?
> 
> I'm still nervous about adding trailing `$x`. I had thought that AArch64 executable sections with a leading `$d` would be very rare, but I had forgotten about `-fsanitize=kcfi` and `-fsanitize=undefined`. These all lead with a `$d`. This makes collisions between trailing `$x` and `$d` more likely than I'd first thought.

Yes. [-fsanitize=function](https://maskray.me/blog/2022-12-18-control-flow-integrity#fsanitizefunction) and `-fsanitize=kcfi` emit executable sections with a leading `$d`.

> I also don't think we can rely on the dumb linker collation order putting the trailing `$x` before the leading `$d` of the following section. For example, assuming we have `lld first.o second.o` with the following linker script fragment:
> 
> ```
>   .text : { second.o(.text) first.o(.text) }
> ```
> 
> If `.text` in first.o leads with a `$d`; like when `-fsanitize=kcfi` is used, and `.text` in second.o has a trailing `$x` then I think when we copy local symbols into the symbol table first.o will get copied first, which will be maintained by stable_partition so we'll get `$d` `$x` at the same address with the `$x` . This is all theoretical, I haven't tried testing it yet.

Agreed, this is a potential downside similar to --symbol-ordering-file and [--call-graph-profile-sort](https://maskray.me/blog/2020-11-15-explain-gnu-linker-options#call-graph-profile-sort).
lld will place first.o symbols before second.o symbols, potentially causing the undesired `$d $x` order.
I know my user is willing to accept the consequence.

Fortunately, this risk seems manageable for some users. First, text sections with trailing data are extremely rare.
Even if such a section is followed by a `-fsanitize=function` section (`.long 0xc105cafe; .long 2772461324; code`),
and the two words are misidentified as code, the following code won't, making this an acceptable risk for some users.

> As an aside, I'm surprised the linker removal of redundant mapping symbols took so long. It looks like we already stable_partition the static symbol table in `void SymbolTableBaseSection::finalizeContents()`. Did you modify that to stable_sort rather than partition, or just stable_sort the local symbols after the partition?

I have developed this hack https://github.com/MaskRay/llvm-project/tree/lld-mapsym , 5% slowdown.

```patch

diff --git a/lld/ELF/SyntheticSections.cpp b/lld/ELF/SyntheticSections.cpp
index c27ab2b67dc2..d8e4bc976016 100644
--- a/lld/ELF/SyntheticSections.cpp
+++ b/lld/ELF/SyntheticSections.cpp
@@ -2172,4 +2172,9 @@ void SymbolTableBaseSection::sortSymTabSymbols() {
   getParent()->info = numLocals + 1;

+  for (auto &s : symbols)
+    s.va = s.sym->getVA();
+  std::stable_sort(symbols.begin(), e,
+                   [](auto &l, auto &r) { return l.va < r.va; });
+
   // We want to group the local symbols by file. For that we rebuild the local
   // part of the symbols vector. We do not need to care about the STT_FILE
@@ -2190,5 +2195,5 @@ void SymbolTableBaseSection::addSymbol(Symbol *b) {
   // Adding a local symbol to a .dynsym is a bug.
   assert(this->type != SHT_DYNSYM || !b->isLocal());
-  symbols.push_back({b, strTabSec.addString(b->getName(), false)});
+  symbols.push_back({b, strTabSec.addString(b->getName(), false), 0});
 }

@@ -2266,5 +2271,5 @@ template <class ELFT> void SymbolTableSection<ELFT>::writeTo(uint8_t *buf) {
       if (isDefinedHere) {
         eSym->st_shndx = shndx;
-        eSym->st_value = sym->getVA();
+        eSym->st_value = ent.va ? ent.va : sym->getVA();
         // Copy symbol size if it is a defined symbol. st_size is not
         // significant for undefined symbols, so whether copying it or not is up
@@ -2492,5 +2497,5 @@ void GnuHashTableSection::addSymbols(SmallVectorImpl<SymbolTableEntry> &v) {
   v.erase(mid, v.end());
   for (const Entry &ent : symbols)
-    v.push_back({ent.sym, ent.strTabOffset});
+    v.push_back({ent.sym, ent.strTabOffset, 0});
 }

diff --git a/lld/ELF/SyntheticSections.h b/lld/ELF/SyntheticSections.h
index d4169e1e1aca..c7e385119c03 100644
--- a/lld/ELF/SyntheticSections.h
+++ b/lld/ELF/SyntheticSections.h
@@ -642,4 +642,5 @@ struct SymbolTableEntry {
   Symbol *sym;
   size_t strTabOffset;
+  uint64_t va;
 };
```

> Another possibility that I think would work in the majority of AArch64 cases without costing too much extra link time, which I think could be done before the stable_partition (to speed that up) is to see if each OutputSection always has the same mapping symbol type. For example if the `.text` section only has `$x` mapping symbols, which would be the most common case, then we can replace these with a single `$x` mapping symbol. If there any `$d` in there bail out. I think this could be done much more cheaply than 5% link time.

This approach (deleting extra mapping symbols) is for the default case. Bailing out could hinder a significant size optimization for executables with minor data-in-code elements.
```
default	| covers all cases but too many mapping symbols
```

Alternatively, removing leading `$x` can reduce relocatable file size, which will be preferred by my users.
We have two choices:
```
omit leading $x, trailing $x if required	| could cause $x and $d at same address, error in some tools. or be incorrect when a data section follows a code section with a trailing $x
omit leading $x, no trailing $x	| incorrect when data is last in the section and code follows in next section. Will have double $x when combined with the default option.
```

The first choice has been discussed above. Technically a linker could delete `$x` in a text section if it shares the address with a `$d`.
This feature could be implemented with minimal code, but I believe my users won't require this level of precision.

The second choice "omit leading $x, no trailing $x" requires a smart linker that builds a output-section-to-symbol map (`.text => $x $d $x $d $x`) and inspects input section boundaries.
I believe there is significant performance overhead (sorting), increased memory consumption (tracking input section boundaries without stealing one bit from `lld::ELF::Symbol`), and potential code complexity. 


https://github.com/llvm/llvm-project/pull/99718