<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/116817>116817</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[AArch64] BOLT asserts when attempting to encode out-of-range Pending Relocations
</td>
</tr>
<tr>
<th>Labels</th>
<td>
BOLT
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
paschalis-mpeis
</td>
</tr>
</table>
<pre>
## What is the problem
When a function is ignored from BOLT optimization, some relocations are emitted to patch the original fixups in the original text. Those are generated during external reference scanning. This is an optimization for adjusting the initial calls to land back to the optimized code. The relevant relocations are of type `R_AARCH64_CALL26`.
**In large binaries the encoding does not fit the supported range, leading to an [assertion](https://github.com/llvm/llvm-project/blob/67e733d36ef0c5ec9fab899ecf9f191d83c7d418/bolt/lib/Core/Relocation.cpp#L387):**
> only PC +/- 128MB is allowed for direct call
Ignoring functions can be common. Some reasons include:
- they are explicitly requested in skip-funcs (like in the below reproducer)
- could be a cold function in split mode
- or could be implicitly ignored for other reasons
---
# Reproducer
### How the reproducer triggers the crash
The error is triggered through instrumentation due to code size increases.
When instrumenting MongoDB, BOLT requires to ignore some functions in order to succeed. When doing so, some additional relocations are emitted. However, instrumentation ends up increasing code size by more than 3x, which goes beyond those relocations' range. This causes an assertion crash when trying to encode such relocations (`R_AARCH64_CALL26`).
### Why BOLT requires to ignore some functions
This requirement was introduced by PR [#89681](https://github.com/llvm/llvm-project/pull/89681). Before that stricter check, those functions were not required to be ignored and the bug was not triggered (ie, on LLVM 18 RC). PR [#101466](https://github.com/llvm/llvm-project/pull/101466) improves support for entry-points other than the main one to the function, but still BOLT bails out in our use case.
Regardless, this issue will not focus on such improvements as the error can be triggered without it
### Reproducer Instructions
- Tested on: Ubuntu 22.04.5 LTS, (6.8.0-1018-aws)
Use use this [Dockerfile](https://gist.githubusercontent.com/paschalis-mpeis/9eb878f73e18fb9d3f996ae7c59d4792/raw/5baef8dc9acdd1cc74800aa5d3542993c926129d/Dockerfile) and the below script to compile the input binary and pull it on an AArch64 Ubuntu host.
Alternatively, one can get it pre-compiled from [here](https://gist.github.com/paschalis-mpeis/9eb878f73e18fb9d3f996ae7c59d4792/raw/e9de7728a517e153d4980225ecac92d2e0923498/mongod.tar.gz).
```bash
# Compile mongodb v7.0.5 with clang 18.1.8 (stage-0):
docker build --progress=plain --tag 'tmp-mdb-stage-0' .
# Then, 'pull' the binary from container to the host, given host is Ubuntu 22.04/24.04 (preferred)
docker run --rm --entrypoint cat tmp-mdb-stage-0 mongo/build/install/bin/mongod > mongod
chmod +x mongod
# once binary is retrieved the image could be deleted
docker rmi tmp-mdb-stage-0
```
Then, compile bolt version `dcd62070cf45` **with assertions ON** and use it to instrument the binary with:
```bash
LLVM_DIR=path/to/llvm/bin/dir
$LLVM_DIR/llvm-bolt mongod -instrument -o mongod.instrumented \
--instrumentation-file=prof.fdata \
--instrumentation-sleep-time=60 \
--instrumentation-no-counters-clear \
--instrumentation-wait-forks \
--skip-funcs=_ZL9InterpretP9JSContextRN2js8RunStateE/1,_ZN2v88internal12_GLOBAL__N_18RawMatchIhEENS0_19IrregexpInterpreter6ResultEPNS0_7IsolateENS0_9ByteArrayENS0_6StringENS0_6VectorIKT_EEPiiiijNS0_6RegExp10CallOriginEj/1,_ZN2v88internal12_GLOBAL__N_18RawMatchIDsEENS0_19IrregexpInterpreter6ResultEPNS0_7IsolateENS0_9ByteArrayENS0_6StringENS0_6VectorIKT_EEPiiiijNS0_6RegExp10CallOriginEj/1
```
---
## Proposed Solutions
A draft PR around (S1) will follow.
#### (S1) Apply pending relocs only when in bounds
Keep the 'external reference scan' optimization on, and apply it on a per callsite basis.
That is, if a particular external reference is within range, then the encoding of that relocation happens, otherwise it is ignored.
This may require some checks that the PendingRelocations were added specifically for optimization, ie during external reference scanning and that are `R_AARCH64_CALL26`.
#### (S2) Hint users to use `-no-scan`
Instead of the [current assertion](https://github.com/llvm/llvm-project/blob/67e733d36ef0c5ec9fab899ecf9f191d83c7d418/bolt/lib/Core/Relocation.cpp#L387), BOLT could escalate this to an exit with a recommendation to use `-no-scan` to overcome this crash. This is not very obvious unless someone is familiar with specific BOLT code regions. Also, by bailing out at all times, we ensure that BOLT does not patch the code in some wrong way?
Q: What if an 'external reference' points to a cold function? Is that pending relocation still flushed?
I assume that even if it does, it will have no effect.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzMWFlz5Kiy_jXyCyGFhGqRHvzgre74jnu5ds9MxLw4EKQkuhHoAqpyza8_kaBa7PZsZx7OiXBUd1GQ5PLll5kw52SnAS6T5XWyvL1gk--NvRyZ4z1T0qXDCNJdNEbsLxNaJrQkv_TME-mI74GM1jQKhiS_TfKr-PlLD5ow0k6ae2k07pSdNhYEaa0ZyPWnhy_EjF4O8jeGOxJ6Q5wZgFhQhoclR5gFAoP0HgTxhozM8z7caKzspGaKtPJlGh2R-vWyhxefkS-9cRCEdKDBMhQjJit1R-DFg8WdFlqwoDkQx5nWUnd4DtV1hOlXKpLWWMLE18l5FIEXSi29ZIpwppRDFRXTgjSMf8MvQaUoAAThRgDKDibClmn_na2mJX4_AklW-ePz1dXjzQ-rxfPN1cMDXSWrPJsdTPHvXhPFbAekkZpZCTESoLkRqJww4Ig2nrTSh1_cNI7Gogcs0x2guxWwsNcbtDRZXjPnwIZgLG8TWvXejy4prxK6Seimk76fmoybIaEbpbaHf9LRmq_AfUI3jTJNQjerNazLUpQraHO-BF63rKnqGnhbt0VdiKrka7EoKjxhFB5UEs_dGAsJ3TwenZLxcUxo-VBW64TWQZPwF91Q3hGj1Z58viEJvU7oJiUFrT5ch8gpZXaINWOJkBa4DyE6R2j8vEdQohMOSHWEM00aINwMg9EZeYqgZA5_k5qrSQCqEo6n6Nt9xOnLqCSXXu2Jhf-fwKGvpSbumxxTlO5IQislv8EBrQ0osyMWRmvExMGijbNUbiYlUAtGuFHiLI80caOSngxGwGG3sacDcjiqcUw4Y4nxPdiDGd-7IU3TV4u0JI8nvV7_MhPAD2YXrDjpT7yVXQc2QpFb5vp4CDEP1hobCCNuwozurZm6nkjtvJ0G0D6mmZgAMYn5Qpz8Df3FUXNwGfmOY06HMYofjO7M7TWiOzAMRkJaCKkZ3RFJ5hRtqYmxApU3xE2cA4iMBMnCoERnjszEhJB4KLDGuySVoVdgi6G8-c4s0MKRaTxYg8JPJjZ7MqB2vmealC94ftdL3pMOE7mBvdHoMOSzs7sTuo7pPHMWZ5ODwFvHVI5xIDu0yNv9nO-BJ5AUeP_KloRW73NPQuvsfRz80u__oq8PYJDusBedQ3YMo-AjiAR64vMjslFCy6peVcW_SUbjpFRCN1EErTNyDe3sYU-ct5J7sIT3wL-ht6NvT7jYgYVAoLOqoQBhfs1JxUI8gDRTFyzArSdoJ7SSgWONJg8PP38gRUUeb4IaR-OKvFisVv_MulkGrTHvrdmCOxB9yHrQ3u7T0Ujt3UwBAWCo-MAQ-hoOhepgOmrdTOgiqVSMbMOkcsRMPmTLZMnkgHDm4HU-PkLHrFDgXHRoKKJuArJDSaEYGT459EkA3qwygsARNhewQBMzBZ_8uZO-D_f790F4IityH9LuHHEp-RLZ2OikvCI_NZP2E6E0yxfZkjx8eUJ1E1qtsirL0yIvqpTt3JGN4-dPDoLZwaxkeX1r-DewrVTwfgSdz2IYJweWG-1B-zmib9qqhG5qaKp11a5LKKq2qUXZ1vWKwZova7FY1zShG8t2Cd0sGwZtJXjNuBAF5-tFleeMLUW5XNC6LnlNVwWtRUI3ZwrS-gTXUHMct3L0kWSHUSqYm5lx8rGf2IcDCDIiPQaMaXJ1ZXm_Whz81xvnZ0q4UqGZ8nILah9RDyGGHWDIyGghnS-a279ked2D_TPX_XN_QS1gvaYVWxZrKJalWNRVTukSOOM1FRTympaLGluRAUuHyDyzWffbd3S3yuNfcyxqCLub2XvxbEO26yzPlgGthCumO1JUWZFViC7nWQdpPjcyQYQIISLNJJUgKWZ3ZzF7yttRYXamqWcdSejaD2M6iCY9yliTt2yMjaWOQF5HcljHgMdwBq8jDJnUsdjhjxhDPNPJLejwDSv0eYIkdEMXWb5AC8bQK1sQx9SYDbAT6moHkqaBcQLhEM48eaN4dBT2fWhyQjdYI1lgskbqYxAI9nbxv_Ee3g-4Sq9fXi2fjDfYwM-mhvLirYQtRMzLgXVw6pAEKPAgXlswyLe6vgn8-Y0HVx-SB7tYsgXrsN4mq1xwsaL5OuftYpmschLb1oCKY1125NPHuB5SDalFhow8tQ3n8cPDR9y8g0YsMs-394-IHeb7hG68ORWP6Fwh37Ryi-Opub4EQ-YYpGeKpGZezU6LWOWWNweBdZq-6XfSQD3l7WhNm7WCefYn-50CGFMvBzyFXvvD3dqk3Ezag3UpV8Dsn-zfMenT1thv7s3GU3-elLfPvz7U9yh0tOA_1__7dIPM_eIfP9Kvrnqc9JNnHu6w8ib05vnXj3RbVVLHWbKgz__z8On66uH5-eNzUT2y3QccWO_7u7uPT_lzUd9bCx28jMcbwK4ewU3K333GHet7ZxTKxy_19d7DlbVsH76unjzOKfH_PwP3xt7_-OX57u6zlFJ-DcuP0N29jEV-w5T6FEbhu69_T9Vb9x_X9ffT7u2oEkv_Z2tG40CQJ6Oms6p_RYRlrcd-i1kz6dCUPWErGPuR1uCM-H5Ti3KPu6_GUe3JCDoMy6FbdnHy3MUBhDQofr72R4AxJG5C17_zyIDU_OpdITZdSAMs3DWXXDJig8qUctIDaZiTLjsQUHh8CVNGixuZ9ZJPitn3HjakC_Qh9Wn092EcOH8wMG3sjE_TAOnZOIIOt4TecScjSZ3ecrLXrCgdGdj-0DHH7j802C7Kxgs_Rz8-ng0dodVmQoAgbgQuW4lG7-Pc-uaJSMJfeMGZux3mw2z2x48p7weeYuB_wCqG7VsYaZChk1WOxBOCeMAm9pvARHQgYGfDJ2uRNf_Ln1MOM3IsjOA4w3yODW58EoIX6WMzw4gFboYBtIjYeNchuGq22O4Os5wwfJ4e1HAG2ILdE9NspZkcmTSOCwEp2DNKR1o2SCWZjfce8HDQVOD42yFqMnKl4mze7MOAEkA8eYJRV4pgHQnQ3SHI3XSY_IKg4-vY6UUxyJY6gnZnjcaxbp-Um3Oc_B_OD_Hlsw1vZu_lOKb3PHKhG18_4CTlhtzP6fCKUqJb49DVqsn1II6X3yOUpmG2ALBXky1mItoRssJHTuvZFodWAm0L_NCdX4jLUtRlzS7gsliXNKdVsSwv-kuxEEXT8AUDSsvVsixXtShpuc6r1YKviupCXtKcLoqiqIvFcrkss5rnYs3WommWZVtUy2SRw8CkyhC5mbHdRRj4LotiVRXrC8UaUC48KlOKjk8oTZa3F_YydhtT55JFrqTz7iTBS6_CQ_Q8byTL2xi0mE4usi7zHobRv37OMJNPTZsGljsQDTljmovJqsu_nYbBJJw6Zqu2l_RfAQAA__9Y95zo">