[PATCH] D120781: [IRLinker] materialize Functions before moving any

Wed Mar 2 10:50:45 PST 2022

nickdesaulniers added a comment.

In D120781#3353252 <https://reviews.llvm.org/D120781#3353252>, @dexonsmith wrote:

> In D120781#3353199 <https://reviews.llvm.org/D120781#3353199>, @nickdesaulniers wrote:
>
>> In D120781#3353144 <https://reviews.llvm.org/D120781#3353144>, @dexonsmith wrote:
>>
>>> Thanks for working on this; it's pretty hairy.
>>>
>>>> This confuses BitcodeReader, which cannot disambiguate whether a
>>>> blockaddress is referring to a function which has not yet been parsed
>>>> ("materialized") or is simply empty because its body was spliced out.
>>>
>>> Another way to solve this would be to mark the function somehow (either intrusively, somehow, or by adding it to a set / map). Can you contrast with that approach?
>>
>> BitcodeReader could maintain a set of all functions it's imported, so that it could do the right thing here (rather than assume a function being empty means not yet parsed);
>> https://github.com/llvm/llvm-project/blob/1e16272ba793e5a6e7308898ecf6ef0dc99e2ad3/llvm/lib/Bitcode/Reader/BitcodeReader.cpp#L3037-L3066
>
> Can it also check `isMaterializable()` to differentiate? Is the "materializable" bit reset after materializing, allowing us to differentiate between "empty becuase it's materializable" and "empty because content got moved away"?

Playing with this idea; I don't think maintaining a set of parsed functions will work; for the `@x`/`@y` case, `@x` has already been moved; there's no way to find the `%label` in `@x`. (Dumping the ir prints `call void @f(i8* blockaddress(@x, <badref>))` since IRMover has spliced out the body of `@x`, we can't find the Value `%label`).

> Not entirely sure this'll work, because of the cycle case (testcase `@c` and `@d`)... but maybe if you know it's been moved, there's a way to look up the location of the new definition? (is the valuemapper available?)

The value mapper is not available to the bitcode reader, AFAICT. The bitcode reader doesn't know about value mapping; IRMover does. IRMover is using BitcodeReader to materialize functions then splice+map values from one source module to another destination module.

Maybe passing an optional mapper into bitcode reader than querying that is the way to go?  That feels odd, since the mapper is (IIUC) mapping Values from one Module to another, while BitcodeReader AFAICT is only concerned ATM with one single Module.  I'm not sure that the mapper will handle mapping a Value from the destination module (since I suspect it only works with values from the source module as inputs).

> (If this isn't going to materialize anything *extra*, it seems fine to materialize in an earlier loop (your patch).)

I think your early concern "can the worklist loop below add new functions to the worklist, which haven't been materialized ahead of time" is accurate; I think more unmaterialized functions can be. Items are added to the Worklist via `IRLinker::maybeAdd()`. `IRMover` calls this in its constructor, but also via a callback declared in `IRLinker::shouldLink`. `IRLinker::AddLazyFor` is a callback, which comes from `IRMover::move`. This callback is only defined to do anything in `ModuleLinker::run`. `ModuleLinker::addLazyFor` seems like it is adding items to the worklist (IIUC) (possibly multiple for COMDAT). (FWIW, there are 2 other callers of `IRMover::move` that both use empty callbacks that don't appear to add more values to the Worklist: `LTO::linkRegularLTO` and `FunctionImporter::importFunctions`. So `ModuleLinker::run` is the case in which unmaterialized functions MIGHT be added to the Worklist, IIUC).

> Another potential concern: what happens with weak linkage? I.e., what if you have:
>
>   declare void @sink(i8*)
>   define linkonce_odr void @haslabel() {
>     br label %label
>   label:
>     ret void
>   }
>   define void @usesblockaddress() {
>     call void @sink(i8* blockaddress(@haslabel, %label))
>     ret void
>   }
>
> What if this TU's `@haslabel` is not selected by the linker? (Maybe this is just a failed link, since there's no guarantee the other `@haslabel` still has a basic block, or that it means the same thing, since it could be optimized away if its TU didn't have a `blockaddress` pointing at it.)

The above test case currently runs through `llvm-link` just fine, regardless of my patch.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120781/new/

https://reviews.llvm.org/D120781