[llvm] [AArch64][BOLT] Ensure tentative code layout for cold BBs runs. (PR #96609)

Paschalis Mpeis via llvm-commits llvm-commits at lists.llvm.org
Tue Oct 15 07:44:49 PDT 2024


paschalis-mpeis wrote:

@maksfb I played around with the test of #112110, and below are my findings.


In summary, with random splitting alone, no stubs are inserted, and no relocation is needed or performed. So as expected, the JITLink crash you observe does not go away.

It's only when I attach profile data that parts of LongJmp are engaged (incl. my patch). The JITLink crash is then is gone, but not due to my patch. In this simple/small test even with a wrong cold layout BOLT is not relaxing any branches. There is an example at the very end.

# Details
The current patch (#96609) avoids relaxing (with `relaxStubToShortJmp`) any branches that tentative code layout predicts they are not needed.

When trying the test of #112110, the flags `--split-functions --split-strategy=randomN` do not trigger `relaxStub`. Therefore, my proposed change does not run, no stubs are [added](https://github.com/llvm/llvm-project/blob/d03fa48eb1a8866eda24d8bed9165a9e28c0a3ce/bolt/lib/Passes/LongJmp.cpp#L606), and no relaxation is [needed](https://github.com/llvm/llvm-project/blob/d03fa48eb1a8866eda24d8bed9165a9e28c0a3ce/bolt/lib/Passes/LongJmp.cpp#L619).
**Was something different expected and could it be an issue with the random splitting strategy?** 🤔 

BTW, this was the reason for not using random splitting in my test, but instead a [profile](https://github.com/llvm/llvm-project/blob/d03fa48eb1a8866eda24d8bed9165a9e28c0a3ce/bolt/test/AArch64/split-funcs-lite.s#L8) with [internal labels](https://github.com/llvm/llvm-project/blob/d03fa48eb1a8866eda24d8bed9165a9e28c0a3ce/bolt/test/AArch64/split-funcs-lite.s#L21).

---
### Running test of #112110
When running the test (just w/o the compact-flag of course), the [fix-point loop](https://github.com/llvm/llvm-project/blob/d03fa48eb1a8866eda24d8bed9165a9e28c0a3ce/bolt/lib/Passes/LongJmp.cpp#L642) does a single iteration on `runOnFunctions` for both `_start`, and `large_function`, which results to no stub insertions and no relaxation:
> Inserted 0 stubs in the hot area and 0 stubs in the cold area. Shared 0 times, iterated 1 times.

As expected, the JITLink crash appears, either with or without this patch:
> BOLT-ERROR: JITLink failed: In graph in-memory object file, section .text: relocation target .text.cold + 0x4 at address 0x724fc0 is out of range of CondBranch19PCRel fixup at 0x600004 (_start, 0x600000 + 0x4)

### What is the intention of the #112110 test?
Using your patch, in the bolted binary we get something like:
```armasm
Disassembly of section .text:

0000000000600000 <_start>:
  600000: cmp	x1, #1
  600004: b.hi	0x60000c <_start+0xc>
  600008: b	0x724fc0 <_start.cold.0>
  60000c: b	0x724fc4 <_start.cold.0+0x4>

0000000000600010 <large_function>:
  600010: nop
...
  724f8c: nop
  724f90: ret

Disassembly of section .text.cold:

0000000000724fc0 <_start.cold.0>:
  724fc0: bl	0x600010 <large_function>
  724fc4: ret
```

And this aligns with your comments in the test. So we have:
- in the hot section: the main fragment `_start`  and the `large_function`
- in the cold section: the non-main fragment of `_start`

With the surrounding code as is, I only achieve this kind of splitting when I change your test to attach some profile data on the `_start` entry-point and the `large_function`. In that case, the **JITLink error is gone** and the resulting code is:
```armasm

Disassembly of section .text:

0000000000600000 <_start>:
  600000: cmp	x1, #0x1
  600004: b.ls	0x60000c <_start+0xc>
  600008: b	0x724fc4 <_start.cold.0+0x4>
  60000c: b	0x724fc0 <_start.cold.0>

0000000000600010 <large_function>:
  600010: nop
...
  724f8c: nop
  724f90: ret

Disassembly of section .text.cold:

0000000000724fc0 <_start.cold.0>:
  724fc0: bl	0x600010 <large_function>
  724fc4: ret
```

Note that the branching condition in the above snippet is switched to `b.ls` when compared to the original test. That must be due to the profile data I used that indicate a call to `large_function` is the more likely branch target.

So when profile data are attached, wider code of LongJmp runs (incl. my change), a couple of stubs are inserted, and the JITLink crassh is gone. But as far as my patch is concerned, in this simple/small example it has no effect.

For example, without my patch we'll get addresses (1):
- [SrcAddr](https://github.com/llvm/llvm-project/blob/d03fa48eb1a8866eda24d8bed9165a9e28c0a3ce/bolt/lib/Passes/LongJmp.cpp#L484): `0x600008`                                                         
- [TgtAddr](https://github.com/llvm/llvm-project/blob/d03fa48eb1a8866eda24d8bed9165a9e28c0a3ce/bolt/lib/Passes/LongJmp.cpp#L483): `0x4 `


and with my patch we'll get (2):
- SrcAddr: `0x600008`
- TgtAddr: `0x724fc4`

(1) is within bounds, so correcting the cold blocks estimated address has no effect.



https://github.com/llvm/llvm-project/pull/96609


More information about the llvm-commits mailing list