[PATCH] D120781: [IRLinker] materialize Functions before moving any

Fri Mar 11 13:35:52 PST 2022

dexonsmith added a comment.

Thinking (more), I think the materializing-too-much explosion will be a problem for use cases other than LTO, and not just for weak functions.

Consider a JIT, which initially just materializes `main` and codegens with callback stubs for any calls. When a stub is called the JIT materializes the called function, patches the stub, and continues, pulling in functions one at a time. This pre-materialization logic would cause the JIT to eagerly materialize everything that `main` transitively references.

That would be a huge regression. This needs to be significantly more narrow.

I've also changed my opinion from the comment in https://github.com/llvm/llvm-project/issues/52787#issuecomment-1010314787, where I said:

> This could work, but I think it should be possible to fix BitcodeReader/IRMover to do the right thing without adding backedges.
>
> [...]
>
> If we did need something new serialized, maybe it could be limited to a bit per basic block saying "is this basic block referenced by a blockaddress outside this function?" (instead of a backedge, just a bit saying there is an edge), and then add ValueHandles for those basic blocks when materializing them to track what they get RAUW'd to.
>
> But I'd much prefer to avoid having the function content in bitcode depend on what references it. Seems like the IRMover and BitcodeReader should be able to track this somehow. E.g., maybe the caller of IRMover should tell BitcodeReader that the basic blocks were moved to another function.

I've realized through this review (thanks @nickdesaulniers and @tejohnson for all iteration!) that:

- A function being `blockaddress`-able ties it tightly to anything that points at it.
- The only way to avoid a materialization explosion is to record backedges.

(Sorry for the poor guidance!)

Here's what I suggest:

- Add to bitcode a way to say that materializing a function requires materializing another one. Maybe this could be a new record in the `FUNCTION_BLOCK` that's arbitrary size, only emitted if there are non-zero of them, which lists the functions that need to be materialized when this one is.
- Use this feature to add edges in both directions every time one function has a `blockaddress` to another one. Probably not too hard; during value enumeration of function bodies, create a map of which functions need to be materialized with each other (based on observed `blockaddress`es), then use that to generate the records during bitcode emission.
- Change the lazy materializer to use this information somehow (maybe this is the hard part).

WDYT?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120781/new/

https://reviews.llvm.org/D120781