<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/100096>100096</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [llvm-bolt] GOT array pointer incorrectly rewritten
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            BOLT
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          peterwaller-arm
      </td>
    </tr>
</table>

<pre>
    # Problem

* An address alone is not enough to distinguish a pointer-to-data vs pointer-to-function (see test case).
* The `patchELFGOT` function currently assumes values in the GOT are pointers-to-functions, and if the function moves, the address is updated.
* In the case of an ‘end of array’ pointer address coinciding with an address of a function which is moved by bolt, bolt updates the ‘end of array’ pointer in the GOT when it should not.

# Static binary glibc startup crash message

This manifests in glibc static binaries on aarch64-linux with the binary crashing on startup with the error message `Unexpected reloc type in static binary`. What’s happening is that the glibc startup code iterates over an array of reloc, but it runs off the end of the array because the array end pointer which lives in the GOT is no longer valid.

It is currently known to manifest on aarch64-linux with glibc, but underlying issue may not be unique to that target or scenario, it may be at risk of happening elsewhere.

# Test case (copy paste whole block, produces ‘./testcase’)

```
clang -nostartfiles -nodefaultlibs -static \
  -Wl,--emit-relocs \
  -Wl,--section-start=.array=0x1000 \
  -Wl,--section-start=.text=0x1004 \
  -Wl,--section-start=.data=0x800000 \
  -x assembler -o testcase - <<EOF


.section .array, "a", @progbits
.globl array_start
.globl array_end
array_start:
  .word 0
array_end:

.section .text
.globl _start
_start:
  adrp x1, #:got:array_start
  ldr x1, [x1, #:gotpage_lo15:array_start]
  adrp x0, #:got:array_end
  ldr x0, [x0, #:gotpage_lo15:array_end]
  adrp x2, #:got:_start
  ldr x2, [x2, #:gotpage_lo15:_start]
  ret
EOF
```

# `objdump --disassemble-all --reloc testcase` output:

```
0000000000001000 <array_start>:
    1000:       00000000        udf     #0

Disassembly of section .text:

0000000000001004 <_start>:
    1004:       d00000e1        adrp    x1, 1f000 <_start+0x1dffc>
                        1004: R_AARCH64_ADR_GOT_PAGE    array_start
    1008:       f947f421        ldr     x1, [x1, #4072]
                        1008: R_AARCH64_LD64_GOTPAGE_LO15       array_start
    100c:       d00000e0        adrp    x0, 1f000 <_start+0x1dffc>
                        100c: R_AARCH64_ADR_GOT_PAGE    array_end
    1010:       f947f800        ldr     x0, [x0, #4080]
                        1010: R_AARCH64_LD64_GOTPAGE_LO15       array_end
    1014:       d00000e2        adrp    x2, 1f000 <_start+0x1dffc>
                        1014: R_AARCH64_ADR_GOT_PAGE    _start
    1018:       f947fc42        ldr     x2, [x2, #4088]
                        1018: R_AARCH64_LD64_GOTPAGE_LO15       _start
    101c:       d65f03c0        ret

Disassembly of section .got:

000000000001ffc8 <.got>:
        ...

000000000001ffe0 <_GLOBAL_OFFSET_TABLE_>:
        ...
   1ffe8:       00001000        udf     #4096
   1ffec:       00000000        udf     #0
   1fff0:       00001004        udf     #4100
   1fff4:       00000000        udf     #0
   1fff8:       00001004        udf     #4100
   1fffc:       00000000        udf     #0
```

# What breaks

Note above there are GOT entries `1ffe8` (`array_start`), `1fff0` (`array_end`) and `1fff8` (`_start`), and that the address of `array_end` shares its address with `_start` (equal to `00001004`).

Running `llvm-bolt -o testcase.bolt testcase`, BOLT erroneously rewrites both these got entries, the new GOT contains:

```
000000000001ffe0 <_GLOBAL_OFFSET_TABLE_>:
        ...
   1ffe8:       00001000        udf     #4096
   1ffec:       00000000        udf     #0
   1fff0:       00400000        .inst   0x00400000 ; undefined
   1fff4:       00000000        udf     #0
   1fff8:       00400000        .inst   0x00400000 ; undefined
   1fffc:       00000000        udf     #0
```

This is incorrect because `1fff0` has been rewritten to point to the new start, but the array data has not moved.

Code which is doing `for (void *i = array_start; array_start != array_end; i++)` will now run off the end of the array.

# Why does it break

The culprit is the code in `patchELFGOT` which currently assumes that all pointers in the GOT may be interpreted as function pointers:

https://github.com/llvm/llvm-project/blob/5f05d5ec8f9bb15c0ac29fce843a2c73165ac414/bolt/lib/Rewrite/RewriteInstance.cpp#L5271-L5277

# Thinking about solutions

* The case of data and code sharing an address is an awkward edge case. It happens here because it’s valid to have a pointer to ‘one past the end of an array’ (per C standard “Additive operators” regarding integer addition to a pointer), and that pointer can end up being referenced through the GOT.
* This edge case is rare but is at last known to create an issue with glibc static binaries. There could be other binaries in the wild with problems that may be revealed at runtime.
* There is an immediate problem of making glibc static binaries work.
  * @paschalis-mpeis attempted a fix in https://github.com/llvm/llvm-project/pull/97710, but the fix is incomplete in that it does not handle the case in general. I’m not sure about the rationale of the condition used or whether it is valid.
  * We may want to special case somehow this if it leads to a clean immediate solution, with an approach similar to that presented by @paschalis-mpeis above, though the exact conditions are subject for discussion.
* If the type referred to by a GOT entry was known, `patchELFGOT` could query the type and use conditionally use the correct getNewFunctionAddress / getBinaryDataAtAddress in each case.
* What does distinguish `array_end` and `_start` is that the symbols belong to different sections (even though they share an address), and the sections are of different types. If it were possible to determine the symbol referenced in the GOT then patchELFGOT could straightforwardly avoid patching entries which reference data symbols (or patch them if it's moving them).
* I am not currently aware of any way to determine the type of the data referred to by a GOT entry, other than to look for relocations which point to the entry [discussion below].

# Can the symbol referenced by a GOT entry be reconstructed?

So far as I’m currently aware we don’t have a straightforward way to determine which symbol a GOT entry points to. If only the address contained within the GOT entry is considered, this does not distinguish `_start` and `array_end` as in the reproducer provided above.

Determining which relocations point to a given GOT entry could require determining register values for some programs, since we don’t get a fully-qualified pointer to the GOT from a GOT relocation for a specific symbol alone. Consider that you could have a register value holding a base pointer to a GOT page, and that register value gets reused for multiple different GOT references within the page.

The base pointer could be hoisted out of a loop and would not itself contain information connecting it to the symbol of interest, so determining the GOT entry being referenced by a relocation would require some kind of symbolic execution to determine the base register value in the worst case.

cc @aaupov @maksfb for BOLT
cc @MaskRay because I think you might be interested and knowledgable, particularly if there is some information BOLT can use to differentiate symbols which share the same address in the GOT straightforwardly, or if there are alternative approaches to consider in light of additional linker knowledge.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzUWktz47iP_zTKBWWXLD9iH3LIo9PbVdnJVE-2-piiJMjihCLVJBXH334LoJ52MpPe3ss_lUpsiQRA4IenJJyTe414Fa1vovXdhWh8aexVjR7tQSiFdiZsdZGa_HgVJUv405pUYRXFd1F83f5NruFag8hzi86BUEYjSAfaeEBtmn0J3kAunZd630hXgoDaSO3RzryZ5cILeHXjS0WjMy-NhijZOkTw6DxkwmGU7OYD06cSIdrEtfBZ-eXh_uvjU7SJod-cNdai9uoIwrmmQgevQjXoQGrwJcLXxycQFjvGbszZRcktCJ2DLHhtT7Qyr8g36Wp3ZOmgqXPhMR9J9y1wIbHBFCA0RF-SaBtHuy3qnC9ZK47dxV0nR081M1JnMpd6DwfpS6LQ3aLNg0yHUmYlCUHC5ZAeITXKk5D0vxXNsTSfFGGkoUOJGqQHV5pG5WTU-dT4S_jLCy8zSKUW9gh7JdMMnBfWNzVkVrgSKnRO7HG88akkgYWWBTrPNuk39sQkOjAahLBZuVnNlNTNW9AFSdfyYw6kJKN7rv0atNbYjj2B5X80vtWYeczBojIZ-GONxH3M9xht4jn8KIXvVeOgFHWNmhhJ0qXwzODktCZHkB4tK9y8kjV1UDJpmzmyXRpPOrWNJlsGiLUWYVzxhhQz0TgcXaElnYmC0ZV8nQKa_Q6U0Xu0hHeZT8z1zdOKwTNetDlocs_OEh_om4_ZSd7oHK06BlW4BqESR_b2FKHR8meDRDGoSNg9ejAWXIZkUUNEpOctKYLwYKV7oYMP-kXl8FCixTOkPXWBgCJDZuoj1MJ5hENpFEKqTPZC9Gtr8iZDN8B9HiX3FEU4iHQ2jZLdhMEmbn_5a6aE3sNMGzZuIRU6-pZjIRrllUwdzFrQROvbsAdg9kNFye1shpX0M7a3e--2Q3bdGdOOlnfz4IjLu_htEcfxp7Z4fPPdjtWndlCo5R3bOD7l8kZREqtUoYWZgU5ZMINoeRstb7883k-UxX_nLQto5U9uIUoSESUJf1zFtTX7VHrXLt8rk6oA5ucg1fl11Hm4Ol62vO4EnR-MzSEeL6Edyw_kYiWNmYz5nhEXua3hbRHOsYyW13tDt88EBlC57Raub0521GKPz8os1idb13dTPvH7fHoFtFzijkv8L1xo4ymP5JTHO8dIOgbJhwzOTmCxpTHgYuo-g9NGm9ikf-dNVcNslkvX4WwmlILZrA3DnXNuYjCNrxt_YtIT8vHoJ3jM8nYCmC8jswLQkmjJHwGg29h-hSYv-H-ULCfS3_XCcviegmoq3ok8K5LnQ1FWgyg578BFJwpbDaDF1qJoj9aSSm7it0VeFBkR7Sm-99Nx-f58ff399r82q-fru-_PXx-fnv-8_vqFOZ2DmrdtB-GK3eqyWCW9cISWQbgx8FfxZTJCx4cybacyPdxtViQUyfT88LhYd1r4QLbsTHHxmeLi31Zc9hnFjbyUNi3iE7VtB3j1ajvz5FW8jT-htkD8s2o7lewcbcmZ0pLfVdriX9B2bszFKdCyVXKmsbPQtIq3289o7JNAe0esMcY26yJeZr0h-6D3z-EhBNoPosOiKLItKTksm8YG-pnP5x9vxWCfrw-PN9cPz4_39399eXp-ur55-PL8j7ToaEWB22kMXLwfA1fxbjPZlv1K6Ax7iviM1eo9Vot4um31f2B1fqpPsfqlU32Y3ahLgNSieHHjW38YjyBS88r1u0XuNKlAR-25r4k2cbDIJqZiNtrE46C3iak4JcjzsiI-XUZOzou4S21XjYid0qFVfdMyaiNPCIIrhaWOwrt-FTcAY5LEAn82QlGZH236nBe4TdD7vdFc0kebWKnXasYN6ai8nPOFUfInWW8eH564c9NoGqeOYPFgJXVUqQl9nUPY83SBddm14xoPrOLMaC-kdp-uIP5zHWs12TOX2nm6_tbfiZY33K8VUmP-_-tpv8H7912Pxwf0qzNjLWa-b5fHHlMKBymibhHkkXtdbqFDixpA06a70N0O7TZPpogE9bY8W5lg-5aa_X72kpsW54Wx5CCvRuYQJdcSouXdpKBZ3oy_QpQshhXcytyAjJIb_t3RKQ5SKdDmALbRH44LznrlH-URcsO-HOLTVH0IWaNqK30YZ2A7u9DvTNTCGc_HaRxPqIjvJmjjQUTb4PON2qLHHIQbZlbdlhMfLb2v-VpyHyX3e-nLJp1npoqSe4of7b9Zbc3fmPkouU-VSaPkfl3E63yN2bbYpelincUiS3ZFhtvVUiTZ5XKxWYtstVjRDp6N3StJ-76HyDJ8-qadFzrDeVbXUbJ8WCeXixn9vTybRZRSv5DNRWoaD86oJkwOT0ajT6MhICOKQjErm2ItE9DjSSJ9O7wchM0B833YO4dvvh2QOOB00qFdjkdUPO8hYJfiFYcpK4fpbhJiNPLQZAyibkY1DAKjZFujhVtyDZ2TLN292-s8l16-IpgarfDGuu7WHVjcC8sTS2K8D8NMyQb3ZhDoNCd1cmZCs0hNDSkSFYsFWtQZ0kIbRskBX5MxsHSDqkiFlpItz9gcCA-KjttPujKLgnKzbodXw3zrdPY4J9tZ8oxG5QRmQ6l8GE22aD9IlQcqdRiOt57ReoDFVxSK4M8DPy8rnM6wLbZml1WFuSThWkJkm0owyN4fjh6MfekTEZGLVnEtXFYKJd2sqpEV4LGq2f-gkG8k9S97Wd0oFSX3u8vLRTyOk0wvBOGqVugx6ETwdJODD4XOUuhc4TANlxr2qNEKNYdvPeYqXusai61H0XorCDxCYRfrMqNbQDUOczAWDiWyWUIkG488g0p-hPnkQYSY72rMpFBBEmcqLM0BPGeTgmgoFLkLaM0UTqzSuThpoJ_I17U1IivByUoqYfvJZ23RofZhHv-eWag2DJVLj2t8E5kfjui4aHRNSlYAyiu5dFnjnDR6_KAhaIYH2ewwFjkIpEcQfclJCnDBCdqy8iTIB5D_bNAeB3LkoRRlepGEUkfohtJd5t2j_wMP921ov25DWZTc050bHqjfCS-ufXdLakDSGYe24SBcSjNqxg-LTivUtt4dqtHxON4dq9QoSvrK6H147lRwDPFdf-a4fH2lUqDX_DHUvaNIPAlROOylVRTIe6qkJzcnI0gPB-TnSc7JVPEQPEePtpIaR9KNo9ooYfoSNYyM0lrEeSvkvvSFsZQTKP1yacEreVreNhQhS_e0Q67p9BElW2PDHmJUBbBHySU_NCIydHX6eO0biOCTo8x_aM8vNAHqeH5Ehk3rqyzBx4gk_YaI6kvBoVkZ88I455mgCBoP55rUbAHQ0fpm8Ae2-CFa353VQbdCf6D8EwfhWJ0Z7bxtMo95tJxMvP8yUAhLRcw4Zp3q5oCQG90v8F0qPjHjufLCMVshx2LxySkiMciMVsdJD9c2OxhS0AhPYbfkFU7maDEP4YaL1TY0nzja4FStl009r095FttHLJZS1avMKbtQQJto_649HD_BbNE52LW3qIC9JHcchA7It_izkRZ7HYWCYC-dD4-2GnQMFgriJMbeiop7QScJ_2eW2KPnp6ZKHWfUvcpCYj6ukTrNFdZUrQkGgZmVCOmjkFlvKWU0zuG21XGIREfTtGdorT8VG0qjuEgSkFIWGokQuNZij5MC6WT_Hr0Di5wBSayqUV7WCkdhKQjfQt2NsUHE56e9wESOvt4pDXHNgbIxP3JWxtQs1aF7GAzSO1RFB0OQujC2ChrLjNYUN6kc7H231ZspQmuAjhsvZyZmnmL4rBZkxx2Z5jDBC-PhRYbSNrCTGeAbZk1XiU5jFh_-RMVdbWds-8RxorIso5QuRFObV_pUiRdXpGyLm8eHp_Gi_xbu5fvoae438kD9whipKCL0XRKyrkm7lKYV5nuRKsZBLayXWaOEVcf2jYRQNPJRxyrn0QmV0ZyjR9kvFDBtNmhjDac8tomoRi8zDCHkLPdwyLaDCJwzlUerBbcEXTWEXEF1gYdIKj4qgSjvKglQUr-g7U-L84v8apnvljtxgVeLy2RxuYxXu_iivFomi2K7We7Wu9XlLk_ydYxput2KpEgQ88XmQl4lcbKKL5PlIk62i3gu1osdLpf5dr3FeJ0n0SrGSkg1p8p2buz-gqv_q0Ucx7vNhRIpKscvwSQJmzBJovXdhb0KY6tm76JVrKTzbqDgpVf84kw_2orWd-1rJVYcR29TtJVSP8ryqC8aq65-uRBnoV2U3Ldyv14l_xsAAP__5NYS0g">