[PATCH] D154604: [BOLT] Calculate output values using BOLTLinker

Maksim Panchenko via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Jul 13 14:02:21 PDT 2023


maksfb added inline comments.


================
Comment at: bolt/lib/Core/BinaryContext.cpp:192
+  // the linker will never see them.
+  Ctx->setAllowTemporaryLabels(false);
+
----------------
jobnoorman wrote:
> maksfb wrote:
> > Background: we've discussed offline that this change causes a noticeable BOLT runtime regression.
> > 
> > The alternative approach for mapping input to output addresses used for address translation, will be to generate a map in a section that we can read and discard later (i.e. skip writing to a file). The map/section contents will be an ordered list of pairs `<InputAddress, OutputAddress>`. For output address we will emit the temporary symbol to the section. I believe `BOLTLinker` should be able to process it.
> > 
> > With the approach above, we can continue to emit all internal symbols as temps and avoid the regression.
> > The alternative approach for mapping input to output addresses used for address translation, will be to generate a map in a section that we can read and discard later (i.e. skip writing to a file). The map/section contents will be an ordered list of pairs `<InputAddress, OutputAddress>`. For output address we will emit the temporary symbol to the section. I believe `BOLTLinker` should be able to process it.
> 
> Interesting idea! I made a PoC implementation and have the following preliminary performance results:
> - With `--enable-bat --update-debug-sections`: +4% total runtime
> - Without: negligible overhead.
> 
> So the overhead is much smaller than for this patch (11%), but it's still non-zero. Would you still agree that the preferred approach (as discussed during our call) is to use this patch but only emit the temp symbols for the targets that need it (i.e., RISC-V)?
> 
> One thing I noticed is that the format you proposed for the map section is quite similar to the BAT section BOLT emits with `--enable-bat`. I wonder if it could be worth trying to emit the BAT section //directly// instead of the intermediary map you proposed. That way, we might get rid of having to manually update BB output values. However, I guess it would mean that we'd have to generate the BAT section in some cases even without `--enable-bat` (e.g., to update debug info).
In your PoC, did you get rid of `LocSyms` and `InputOffsetToAddressMap`? I expect the generation of the map section will have a similar overhead to the one when populating those structures, so hopefully there will be no overhead at all when those structures are gone.

It should be possible to generate BAT directly and then maybe perform some post-processing (if required). At the moment, there are other types of metadata that rely on the address translation, e.g. SDT and pseudo probes. It will take some time to get rid of AT completely. I'm trying to avoid this mechanism while adding new rewriters for the kernel. Once the new replacement process is streamlined, we can refactor the legacy code.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154604/new/

https://reviews.llvm.org/D154604



More information about the llvm-commits mailing list