[llvm] [BOLT] Improve handling of relocations targeting specific instructions (PR #66395)

Thu Oct 5 05:09:09 PDT 2023

mtvec wrote:

> Adding data structures to BinaryFunction class is a bit expensive for us because it increases our memory footprint

Alright, that makes sense. I got rid of `BinaryFunction::InstructionLabels` as follows:
- While analyzing relocations, simply add instruction references to the `Relocations` map. This is slightly awkward for two reasons (but still workable imo):
   - We cannot add a symbol to the relocation since we have nowhere to store it. We cannot store in the reloc itself because multiple relocs might want to point to the same symbol. Relocations without symbols didn't happen before (afaict) so I had to add a null-check to an unrelated debug output.
   - In order to be able to reconstruct the symbol reference later, we need to know the address the reloc points to. This is typically calculated by extracting the value. However, in case of instruction-referencing relocs like `PCREL_LO*` on RISC-V, the extracted value is unrelated to the referenced symbol. I simply set the value of the reloc to the address of the symbol it points to to solve this.
- While disassembling, keep track of instruction labels in a map:
   - Whenever we encounter and instruction with an instruction-referencing reloc, look up the label it points in the map (or create a new one) and replace its immediate with a symbol ref as usual.
   - After having disassembled all instructions, iterate the map to add the labels to the corresponding instructions.

So basically, the `InstructionLabels` map is moved to the stack of `disassemble()` at the cost of a slightly awkward representation of some relocs.

https://github.com/llvm/llvm-project/pull/66395