[llvm] [BOLT] Improve handling of relocations targeting specific instructions (PR #66395)

Fri Sep 15 01:16:20 PDT 2023

mtvec wrote:

> Thanks! Am I understand it right that this is from linker standpoint? In the final binary ld/addi would have just an imm operand. But since we need lo12 offset between 1 instruction and the symbol we're addressing the 1st instruction (the second might have other HI address) and the linker knowing where we're addressing would find the 1st instruction, it's relocation and it's symbol to complete the lower part of the address.

That all sounds correct.

> If I'm right his schematic is soo weird to me. I mean at aarch64 we're doing:
> adrp x0, symb // PC-relative high 20 bit of address. I think it is ~ to auipc

It's similar to `auipc` but not the same. `adrp` clear the lower 12 bits while `auipc` doesn't so the latter doesn't result in a 4KiB-aligned address. This also means that `auipc` can, for example, be used to calculate the value of PC.

> // Any count of instructions add x0, x0, symb // lo-12 part of address is offset in 4k page.
>
> Since we're addressing inside the 4k page this offset would never be changed. So really only the adrp is PC-relative, the second one is page-relative offset. But RISC-V for some reason seems doesn't want to use such schematics and at linker time it "truly" calculates the lower part of the offset using the 1st instruction as a base and it's relocation to find the symbol. I really had to strain my brain to understood this schematic).

As far as I can tell, using something like `adrp` wouldn't work on RISC-V since the I- and S-type instructions it would need to pair-up with take a *signed* 12-bit immediate so cannot represent arbitrary page offsets. On AArch64 this can work since these instructions have multiple addressing modes including one with unsigned 12-bit offsets.

https://github.com/llvm/llvm-project/pull/66395