[PATCH] D118416: [Metadata] Use temporary MD nodes when appending module flags during module linking

Mon Jan 31 12:37:32 PST 2022

dexonsmith added subscribers: aprantl, dblaikie.
dexonsmith added a comment.

Seems like a good problem solve. IIUC, the root problem is that there are O(N) "versions" of the module flags being created, something like:

- `Flags1 = !{!1}`
- `Flags2 = !{!1, !2}`
- `Flags3 = !{!1, !2, !3}`
- ...
- `FlagsN = !{!1, !2, !3, ... !N}`

IIUC, the status quo (without this patch) is that all these iterations of `Flags` are "leaked" onto the LLVMContext.

- With this patch, each iteration is instead made a temporary node so it can be deleted after. That seems reasonable (although the delete operation seems a bit error-prone).
- This patch does not fix the root problem, which is that "append" is O(N) instead of amortized O(1). As a result, O(N^2) work is still being done here.

I think it'd be better to fix the workload to use a "vector" data structure.

One approach would be to (carefully) add a capacity and resize operation to `MDNode` operands:

- Change `MDNode` to allow hung-off operands; e.g., steal a bit from `NumOperands` to indicate whether hung-off, and store the pointer to operands and a capacity where co-allocated operands usually go.
- Add a function `MDNode::resize()` that only appropriate subclasses enable (only some `MDNode` subclasses would want to enable this feature). Asserts if the subclass doesn't allow it, and asserts if the node is `Uniqued` (i.e., only legal for `Temporary` or `Distinct`, which have a real "identity" and are not stored in uniquing sets).
- Ensure that an empty temporary or distinct node can be resized later by co-allocating enough space; as a side effect, even an empty node would start with some small storage, but that's probably okay. (An empty uniqued node wouldn't need anything co-allocated.)
- (The capacity of a distinct node would not be serialized in textual IR or bitcode...)
- (Probably there are other implementation strategies.)

With `MDNode::resize()` in place:

- Change module flags in "append" or "append-unique" mode to use a `distinct` metadata node to store the list of nodes. The flag tuple that references it should be `distinct` as well. (`distinct` means "don't store in the uniquing table"; appropriate here since the content can change.)
- Ideally, add an IR verifier check that these nodes are distinct. Fix textual IR testcases to use `distinct` in these places and add a bitcode upgrade to apply `distinct` retroactively.
- During module linking, call `resize()` (and/or a derived `push_back()`) to extend the existing metadata node in-place rather than creating a new one. (If the "ideal" verifier/upgrade step above is skipped, then the append operation would need to check "are the right nodes `distinct`?" and if not create new `distinct` ones that can be modified in place.)

A second approach would be to add first-class support for module flags. E.g., could create an `MDModuleFlag` metadata type to use instead of the 3-part tuple. This node type could support the necessary operations efficiently.

A third approach would be to not fix the IR at all, but instead would be to change `llvm::Module` to have a "lazy module flags" mode. When enabled, `!llvm.module.flags` is NOT kept up-to-date with changes. Append and AppendUnique flags that have been modified are stored in staging data structures (maybe, SmallVector and SmallSetVector, respectively). LTO could put the module into this "lazy" mode before linking, and put it back into "normal" mode after it was finished linking to commit the changes.

IMO, the first approach I mention above would be best (possibly with the second approach as a follow up!) since the `resize()` operation would be generally useful; there are lots of metadata nodes that are really just lists of arbitrary size, and that get extended later on (e.g., @dblaikie @aprantl, I think there are places in debug info IR where new nodes are allocated for this sort of thing, is that right?).

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D118416/new/

https://reviews.llvm.org/D118416