[PATCH] D145687: [BOLT] Add minimal RISC-V 64-bit support
Job Noorman via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Mar 28 01:51:17 PDT 2023
jobnoorman added inline comments.
================
Comment at: bolt/lib/Rewrite/RewriteInstance.cpp:2642
+ RelSymbol->getObject()->section_end()) {
+ auto *Symbol = BC->getOrCreateUndefinedGlobalSymbol(SymbolName);
+ ContainingBF->addRelocation(Rel.getOffset(), Symbol, Rel.getType(), 0,
----------------
rafauler wrote:
> jobnoorman wrote:
> > rafauler wrote:
> > > I'm probably not getting the full picture here, so correct my thinking below.
> > >
> > > So, if I understand correctly, you have an ABS __global_pointer$ symbol that points outside any section that BOLT has loaded (otherwise ReferencedSection wouldn't be null here). Therefore, BOLT won't change the location of this symbol (if it is pointing to some unknown region of memory that is not covered by any section known by BC->getSectionForAddress(SymbolAddress).
> > >
> > > So why register this reloc, if we are not changing this symbol? We will arrive at link step and RuntimeDyld will resolve this unknown symbol to zero, no? How do we know the correct value of this symbol?
> > >
> > > If we don't create the relocation here (with addRelocation), we will disassemble the original code that is already patched with the correct value for this symbol -- at least for X86. For RISC, this is harder to do if the real address is split in different instructions, and we don't have the luxury of ignoring the relocation. So my feeling is that, at the very least, this code is RISC-only, but you didn't protect this code from running in the X86 backend.
> > Thanks for the extensive review!
> >
> > > So, if I understand correctly, you have an ABS __global_pointer$ symbol that points outside any section that BOLT has loaded (otherwise ReferencedSection wouldn't be null here). Therefore, BOLT won't change the location of this symbol (if it is pointing to some unknown region of memory that is not covered by any section known by BC->getSectionForAddress(SymbolAddress).
> >
> > Correct. Except that `__global_pointer$` should be handled even if it happens to point to a section. I'll look into a way to fix this.
> >
> > > So why register this reloc, if we are not changing this symbol? We will arrive at link step and RuntimeDyld will resolve this unknown symbol to zero, no? How do we know the correct value of this symbol?
> >
> > The current behavior of this patch is that `__global_pointer$` keeps its original value. I just noticed that this works kind of accidentally because symbols that don't reference a section are registered for "pretty printing" [here](https://github.com/llvm/llvm-project/blob/9580bebd47808b58f8ab833d71d2d38976caa9b8/bolt/lib/Rewrite/RewriteInstance.cpp#L1041-L1048).
> >
> > Now, whether this is correct or not is a good question. The linker is supposed to pick the value for `__global_pointer$` when performing [Global-pointer relaxation](https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-elf.adoc#global-pointer-relaxation). Since linker relaxation is not addressed by this patch (`-Wl,--no-relax` is currently mandatory to use it), I don't have an answer to this question yet. (*)
> >
> > > If we don't create the relocation here (with addRelocation), we will disassemble the original code that is already patched with the correct value for this symbol -- at least for X86. For RISC, this is harder to do if the real address is split in different instructions, and we don't have the luxury of ignoring the relocation.
> >
> > The problem is that the relocation against `__global_pointer$` is typically used by a PC-relative code sequence like this (copied from the link above):
> >
> > ```
> > 1: auipc gp, %pcrel_hi(__global_pointer$) # R_RISCV_HI20
> > addi gp, gp, %pcrel_lo(1b) # R_RISCV_LO12_I
> > ```
> >
> > So even though we don't change the value of `__global_pointer$`, the location of this code sequence *can* change and hence the immediate values in it need to be patched again. Therefore, I think adding this relocation is necessary.
> >
> > > So my feeling is that, at the very least, this code is RISC-only, but you didn't protect this code from running in the X86 backend.
> >
> > I think it's true this code will currently only ever get triggered on RISC-V and I don't mind protecting it as such. However, do you think it will do the wrong thing for an X86 binary containing an ABS symbol? If not, it might make sense to not protect this code to reduce the number of target-specific code paths.
> >
> > (*) The fact that we don't support linker relaxation yet got me thinking about whether we could skip handling of `__global_pointer$` altogether in this patch. This should work in principle since the only place it's used in the binaries I've tested is during the initialization of `gp` in GCC's startup routines; `gp` is never actually read from in the rest of the binary. The problem is that we then should also ignore the second relocation (R_RISCV_LO12_I) in the example above or else RuntimeDyld gets angry. It feels like any way to filter-out this particular case of R_RISCV_LO12_I would be quite hacky so it will probably be preferable to not do this. I'll look into it though.
> Oh I see, thanks for explaining. I was afraid to push this because I thought we would resolve these symbols to zero, but since we're loading them correctly in discoverFileObjects, then it makes sense to push this to X86 too. Can you update the pretty printing comment to reference this patch and how it is important to handle relocs against these symbols?
>
> To push this to X86, I would like this code to be more guardrails. First, is there a way to check that no undefined symbols are being handled here? Second, I really don't like the name "getOrCreateUndefinedGlobalSymbol" because this symbol is supposedly defined. Can you use "BC.Ctx->lookupSymbol()" and explicitly search for the symbol you need, and drop the relocation in case the symbol is not there? That's because, in case the symbol is not there, you will create an undefined symbol, which ideally we shouldn't have. You can even add an assertion here for RISCV only, in case it is global_pointer and you couldn't find it., to keep track of your support for it.
> Oh I see, thanks for explaining. I was afraid to push this because I thought we would resolve these symbols to zero, but since we're loading them correctly in discoverFileObjects, then it makes sense to push this to X86 too. Can you update the pretty printing comment to reference this patch and how it is important to handle relocs against these symbols?
Done!
> To push this to X86, I would like this code to be more guardrails.
I rewrote the implementation of ABS symbol handling and I believe it's more robust now. Note that it moved a bit up because of this so you won't find the implementation here anymore.
> First, is there a way to check that no undefined symbols are being handled here?
The code will now not trigger anymore for symbols that were never registered in `discoverFileObjects`. However, I believe that the way ABS symbols are detected there (checking if they don't have a section) would also match UND symbols. I'm not even sure if it's possible to make the distinction between ABS and UND symbols here. I feel like this shouldn't be a big problem though as UND symbols shouldn't happen in practice. Would you agree?
> Second, I really don't like the name "getOrCreateUndefinedGlobalSymbol" because this symbol is supposedly defined. Can you use "BC.Ctx->lookupSymbol()" and explicitly search for the symbol you need, and drop the relocation in case the symbol is not there? That's because, in case the symbol is not there, you will create an undefined symbol, which ideally we shouldn't have. You can even add an assertion here for RISCV only, in case it is global_pointer and you couldn't find it., to keep track of your support for it.
I managed to remove the need to create symbols entirely by reusing the symbol lookup that already happened earlier in the `handleRelocation` method.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D145687/new/
https://reviews.llvm.org/D145687
More information about the llvm-commits
mailing list