[all-commits] [llvm/llvm-project] 46707b: [AArch64, ELF] Allow implicit $d/$x at section begi...

Fangrui Song via All-commits all-commits at lists.llvm.org
Thu Aug 22 09:12:35 PDT 2024


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 46707b0a83b7769965f9b1b3d08b2cc6bd26c469
      https://github.com/llvm/llvm-project/commit/46707b0a83b7769965f9b1b3d08b2cc6bd26c469
  Author: Fangrui Song <i at maskray.me>
  Date:   2024-08-22 (Thu, 22 Aug 2024)

  Changed paths:
    A lld/test/ELF/aarch64-mapsyms-implicit.s
    M llvm/include/llvm/MC/MCAssembler.h
    M llvm/include/llvm/MC/MCTargetOptions.h
    M llvm/include/llvm/MC/MCTargetOptionsCommandFlags.h
    M llvm/lib/MC/MCTargetOptionsCommandFlags.cpp
    M llvm/lib/Target/AArch64/MCTargetDesc/AArch64ELFStreamer.cpp
    M llvm/test/MC/AArch64/mapping-across-sections.s

  Log Message:
  -----------
  [AArch64,ELF] Allow implicit $d/$x at section beginning

The start state of a new section is `EMS_None`, often leading to a
$d/$x at offset 0. Introduce a MCTargetOption/cl::opt
"implicit-mapsyms" to allow an alternative behavior
(https://github.com/ARM-software/abi-aa/issues/274):

* Set the start state to `EMS_Data` or `EMS_A64`.
* For text sections, add an ending $x only if the final data is not instructions.
* For non-text sections, add an ending $d only if the final data is not data commands.

```
.section .text.1,"ax"
nop
// emit $d
.long 42
// emit $x

.section .text.2,"ax"
nop
```

This new behavior decreases the .symtab size significantly:

```
% bloaty a64-2/bin/clang -- a64-0/bin/clang
    FILE SIZE        VM SIZE
 --------------  --------------
  -5.4% -1.13Mi  [ = ]       0    .strtab
 -50.9% -4.09Mi  [ = ]       0    .symtab
  -4.0% -5.22Mi  [ = ]       0    TOTAL
```

---

This scheme works as long as the user can rule out some error scenarios:

* .text.1 assembled using the traditional behavior is combined with .text.2 using the new behavior
* A linker script combining non-text sections and text sections. The
  lack of mapping symbols in the non-text sections could make them
  treated as code, unless the linker inserts extra mapping symbols.

The above mix-and-match scenarios aren't an issue at all for a
significant portion of users.

A text section may start with data commands in rare cases (e.g.
-fsanitize=function) that many users don't care about. When combing
`(.text.0; .word 0)` and `(.text.1; .word 0)`, the ending $x of .text.0
and the initial $d of .text.1 may have the same address. If both
sections reside in the same file, ensure the ending symbol comes before
the initial $d of .text.1, so that a dumb linker respecting the symbol
order will place the ending $x before the initial $d.

Disassemblers using stable sort will see both symbols at the same
address, and the second will win.

When section ordering mechanisms (e.g. --symbol-ordering-file,
--call-graph-profile-sort, `.text : { second.o(.text) first.o(.text) }`)
are involved, the initial data in a text section following a text
section with trailing data could be misidentified as code, but the issue
is local and the risk could be acceptable.

Pull Request: https://github.com/llvm/llvm-project/pull/99718



To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications


More information about the All-commits mailing list