[PATCH] D120781: [Bitcode] materialize Functions early when BlockAddress taken

Duncan P. N. Exon Smith via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Apr 7 16:28:47 PDT 2022


dexonsmith added a comment.

In D120781#3437465 <https://reviews.llvm.org/D120781#3437465>, @nickdesaulniers wrote:

> there's some tests failing where the "use list" (???) seems to be incorrect

I suspect these are "use-list order" tests, which confirm that we can round-trip to bitcode without changing the use-list order.

Quick summary:

- Every `Value` has an intrusive linked list of its `Use`s -- i.e., where it's used as an operand.
- Some algorithms walk a `Value`'s use-list, and their output can be influenced by the order that the `Use`s are listed.
- This order is deterministic, but arbitrary somewhat arbitrary.
- Bitcode serialization has an optional feature to serialize / recover this use-list order. This is important for reproducing some (rare) bugs, or and for guaranteeing that serializing/deserializing has no side effects.
- To minimize overhead, the algorithm only stores diffs vs. the "natural" use-lists you'd get by running `materializeAll()`. (Side note: this feature is optional because the storage overhead is a bit too high. If we could guarantee that no one looked at use-lists of pure constants like `i1 0`, `ptr null`, and `i32 0`, it would likely be cheap enough to store all the time.)

(Incidentally, that feature is the only reason I know anything about `blockaddress`...)

Anyway, if you stop calling `materialize()` from inside `parseFunctionBody()`, you'll stop changing the use-list order, and these tests should stop bothering you.

> some blockaddresses are being deserialized with the basic block name changed to a number from a string that I don't yet understand.

I'm guessing it's not safe to call `materialize()` from `parseFunctionBody()`; it's maybe not re-entrant, and something is getting corrupted?

Have a look at how blockaddress forward references are handled; if you follow a similar path for these discovered back-edges the corruption might go away.



================
Comment at: llvm/lib/Bitcode/Reader/BitcodeReader.cpp:4324-4343
+      // When we have the RARE case of a BlockAddress Constant that is not
+      // scoped to the Function it refers to, we need to conservatively
+      // materialize the referred to Function, regardless of whether or not
+      // that Function will ultimately be linked, otherwise users of
+      // BitcodeReader might start splicing out Function bodies such that we
+      // might no longer be able to materialize the BlockAddress since the
+      // BasicBlock (and entire body of the Function) the BlockAddress refers
----------------
I think it'd be better to save these in a list to iterate through after finishing materializing whatever's already "scheduled".

I think the right place to run the code is probably in `materializeForwardReferencedFunctions()`, which is already doing stuff for block addresses (in the other direction!). This is called from two places:
- `BitcodeModule::getModuleImpl()`, which only calls it if lazy-loading -- i.e., if NOT calling `materializeAll()`. I am guessing that lazy-loading needs this because block addresses can be referenced by something that isn't lazily loaded (not sure what that is; maybe globals or some metadata?).
- `BitcodeReader::materialize()`, at the end.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120781/new/

https://reviews.llvm.org/D120781



More information about the llvm-commits mailing list