<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/64633>64633</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[BOLT][AArch64] Implementation of option `--plt` on AArch64
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
Kepontry
</td>
</tr>
</table>
<pre>
I'm writing to discuss the questions when implementing the `--plt` option of BOLT on AArch64. Here is the test program written in C.
```C
#include <stdio.h>
int main(){
printf("Hello World\n");
return 0;
}
```
In X86, the `printf` function call is compiled as a call to `puts` entry in the .plt section.
```asm
40058f: e8 fc fe ff ff callq 400490 <puts@plt>
```
The first inst of the `puts` entry (i.e., pc 0x400490) is a jump to the implementation address stored in the GOT entry of `puts`.
```asm
Disassembly of section .plt:
0000000000400480 <.plt>:
400480: ff 35 42 0b 20 00 pushq 0x200b42(%rip) # 600fc8 <_GLOBAL_OFFSET_TABLE_+0x8>
400486: ff 25 44 0b 20 00 jmpq *0x200b44(%rip) # 600fd0 <_GLOBAL_OFFSET_TABLE_+0x10>
40048c: 0f 1f 40 00 nopl 0x0(%rax)
0000000000400490 <puts@plt>:
400490: ff 25 42 0b 20 00 jmpq *0x200b42(%rip) # 600fd8 <puts@GLIBC_2.2.5>
400496: 68 00 00 00 00 pushq $0x0
40049b: e9 e0 ff ff ff jmpq 400480 <.plt>
```
The `--plt` option uses the function `convertCallToIndirectCall` to combine the inst `callq`(0x40058f) and the inst `jumpq`(0x400490) into one `callq`(0xa000ef) and replace the original `callq` inst(0x40058f), thus reducing the inst count executed.
```asm
a000ef: ff 15 e3 0e c0 ff callq *-0x3ff11d(%rip) # 600fd8 <puts@GLIBC_2.2.5>
```
However in AArch64, there exists no inst that call to an address stored in the memory. They use 4 insts from 0x400540 to 0x40054c to do the similar work.
```asm
400694: 97ffffab bl 0x400540 <puts@plt>
```
```asm
0000000000400540 <puts@plt>:
400540: 90000110 adrp x16, 0x420000 <puts@GLIBC_2.17+0x420000>
400544: f9400e11 ldr x17, [x16, #0x18]
400548: 91006210 add x16, x16, #0x18
40054c: d61f0220 br x17
```
So, my question is, should we replace the original `bl` inst with these 4 insts(do similar optimization work as in X86), or just give up the `--plt` option on AArch64?
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJykVktv4zAO_jXKhWhAy4_YhxyadjJTbIE5TIHdW-EHHasrW64kt-n--oXkR5JpOoPFCkHaSCQ_8uNDyo0Rh45oy-Idi-9X-WAbpbf_oF51Vn-sClV9bB8Y37TwroUV3QGsgkqYcjAGbEPwOpCxQnUG3hvqQLS9pJa6UbQhYAne3PTSsgRB9U4SVA27n49PoDq4vdVlk0Rr-EGaQIwmLRkLvVYHnY-w1hnu4G7N8J7h7fSd4Pi5m37zUHSlHCoCFt4ZWwm1blj4bTwVnYU2Fx3jKeMZ2-zGbei16Gztd_kPklLBP5WWFYvvOsa5Ew1nUQDQZAfdAS6bbHP_mzcXLp59P3TwrzRh_G5mZUJOEOqhKz0xZS6lI6FUbS8kVZAbyMddq7zOYI3TIJccR4mzte6lBUPexHWGctNOMUSIcVqz0EdDKdQl1AR17T5uOaxXJxVl6Gj0gBG6_M1MXo30qSGohTYWRGesy_Ac5YXHjKdiTWvHQl8CHkcgxjMXdQ4vQ9u7SJ3uUki5pyavKk3GgLFKUzWH_v3n02RZ1Wdof2HhXpjcGGoL6fUm6jyPLLxIGi7LeZp6StYTGbMkwHg2kQrguAxjiDhgARwBcdrvB9O8AuCRIxYR90UXa9G7-BkPIUGsy9RhPH9__Lm7fXz-ud__-vb0_HS7e_z2zPgOj-mShwk3ucDlMUTRCfel7V8BgPHbCTS6AJ3WjF3hn7EDPIF77PKEjTUENURn0Y6rU70EPOKEmx9dS33J8LWiu-Q5w0_xfuL5U9T8j1GnZ5jfHx92d898zdfxZazZGc9J6rCWz7Km_DIeuYDPnS5OypQB4dRxU9OdO_25zq6V8u-td2XIDobGabpMF5Zgqbo30vYul_JJPXSV0FT6X07RKjd4CtHR2ICukZ2OGwkOk6e-Yd384BnkXXUh5nr3XGzu684qUB19spQjIi2WNPUyL0dgpcVBdLk8V_Ewlx6Mo3QwoKkayvm68e6Uaugs0JHKwVL1t5kIkyvndRXEQCEgQYmnLI3DkfHbGzyGdR0E1f9VV1ez-UO90xtpN-Cmu3G6MjQBHYWxBjo1Rmmb3C6XQ_7ViGypVfpjDU8NfbiigMhrG6i1ascRHEc--dP_pb_hxyFsRCtkruFd6X__lcYIMckiR2O2qeu6zouRD8wKyTBboP6Ha-XPkBez46rlZXaMAt43Jx8EOPuWV7pnmB0DfzfjMeJO4Erqgo0fguP5xXCIIx91nUWIFASzZVlpb3jjDLN4N0EwHuIxSFl8f0ZcHKXeuQAx4QGOjlUnv37TPVf0Q7hKgho5X6IqZuivuf2lnMX2Y3nBgTBuxzRqkBW805dNWci5I-Fd2Madn-qK8bRSS-G4YdSK_4x3uKsi96QR01PIt7DS8DIYCwfxRjD0X74ZT_0Q7lfVNqyyMMtXtA2SLEiDIM42q2aLRRYkdZCFdUxxnoZhURWUhlgXaVLWSboSW448xDTgGIZZvFmHUYFhECZlilhW2YZFSG0u5FrKt3at9GEljBlom0RJGK5kXpA0_rHMeUfv4A_dQzG-X-mt07kphoMrG-l69WTFCiv9K9u9fF3u490cT3wPD5ePHVXPYV8wsVCwGrTcNtb2xlU43zO-PwjbDMW6VC3je4c6_bnptXqh0jK-974axvc-lv8GAAD__2V3STg">