[PATCH] D145687: [BOLT] Add minimal RISC-V 64-bit support

Job Noorman via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Mar 27 04:35:42 PDT 2023


jobnoorman added inline comments.


================
Comment at: bolt/lib/Rewrite/RewriteInstance.cpp:2642
+            RelSymbol->getObject()->section_end()) {
+      auto *Symbol = BC->getOrCreateUndefinedGlobalSymbol(SymbolName);
+      ContainingBF->addRelocation(Rel.getOffset(), Symbol, Rel.getType(), 0,
----------------
rafauler wrote:
> I'm probably not getting the full picture here, so correct my thinking below.
> 
> So, if I understand correctly, you have an ABS __global_pointer$ symbol that points outside any section that BOLT has loaded (otherwise ReferencedSection wouldn't be null here). Therefore, BOLT won't change the location of this symbol (if it is pointing to some unknown region of memory that is not covered by any section known by BC->getSectionForAddress(SymbolAddress).
> 
> So why register this reloc, if we are not changing this symbol? We will arrive at link step and RuntimeDyld will resolve this unknown symbol to zero, no? How do we know the correct value of this symbol?
> 
> If we don't create the relocation here (with addRelocation), we will disassemble the original code that is already patched with the correct value for this symbol -- at least for X86. For RISC, this is harder to do if the real address is split in different instructions, and we don't have the luxury of ignoring the relocation. So my feeling is that, at the very least, this code is RISC-only, but you didn't protect this code from running in the X86 backend.
Thanks for the extensive review!

> So, if I understand correctly, you have an ABS __global_pointer$ symbol that points outside any section that BOLT has loaded (otherwise ReferencedSection wouldn't be null here). Therefore, BOLT won't change the location of this symbol (if it is pointing to some unknown region of memory that is not covered by any section known by BC->getSectionForAddress(SymbolAddress).

Correct. Except that `__global_pointer$` should be handled even if it happens to point to a section. I'll look into a way to fix this.

> So why register this reloc, if we are not changing this symbol? We will arrive at link step and RuntimeDyld will resolve this unknown symbol to zero, no? How do we know the correct value of this symbol?

The current behavior of this patch is that `__global_pointer$` keeps its original value. I just noticed that this works kind of accidentally because symbols that don't reference a section are registered for "pretty printing" [here](https://github.com/llvm/llvm-project/blob/9580bebd47808b58f8ab833d71d2d38976caa9b8/bolt/lib/Rewrite/RewriteInstance.cpp#L1041-L1048).

Now, whether this is correct or not is a good question. The linker is supposed to pick the value for `__global_pointer$` when performing [Global-pointer relaxation](https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-elf.adoc#global-pointer-relaxation). Since linker relaxation is not addressed by this patch (`-Wl,--no-relax` is currently mandatory to use it), I don't have an answer to this question yet. (*)

> If we don't create the relocation here (with addRelocation), we will disassemble the original code that is already patched with the correct value for this symbol -- at least for X86. For RISC, this is harder to do if the real address is split in different instructions, and we don't have the luxury of ignoring the relocation.

The problem is that the relocation against `__global_pointer$`  is typically used by a PC-relative code sequence like this (copied from the link above):

```
1:  auipc gp, %pcrel_hi(__global_pointer$) # R_RISCV_HI20
    addi  gp, gp, %pcrel_lo(1b)            # R_RISCV_LO12_I
```

So even though we don't change the value of `__global_pointer$`, the location of this code sequence *can* change and hence the immediate values in it need to be patched again. Therefore, I think adding this relocation is necessary.

> So my feeling is that, at the very least, this code is RISC-only, but you didn't protect this code from running in the X86 backend.

I think it's true this code will currently only ever get triggered on RISC-V and I don't mind protecting it as such. However, do you think it will do the wrong thing for an X86 binary containing an ABS symbol? If not, it might make sense to not protect this code to reduce the number of target-specific code paths.

(*) The fact that we don't support linker relaxation yet got me thinking about whether we could skip handling of `__global_pointer$` altogether in this patch. This should work in principle since the only place it's used in the binaries I've tested is during the initialization of `gp` in GCC's startup routines; `gp` is never actually read from in the rest of the binary. The problem is that we then should also ignore the second relocation (R_RISCV_LO12_I) in the example above or else RuntimeDyld gets angry. It feels like any way to filter-out this particular case of R_RISCV_LO12_I would be quite hacky so it will probably be preferable to not do this. I'll look into it though.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D145687/new/

https://reviews.llvm.org/D145687



More information about the llvm-commits mailing list