[PATCH] D124548: [Metadata] Add a resize/reserve capability for MDNodes

Fri Apr 29 00:01:55 PDT 2022

dexonsmith added a comment.

Thanks for working on this! I think this will be a big improvement.

I have a few detailed suggestions inline (probably way too detailed, but I'm not sure I'll be super responsive in the next couple of weeks and didn't want to block you; don't feel you need to match my exact suggestions if you have a better idea).

I suggest splitting into multiple patches:

1. Prep: Fix `MDOperand(MDOperand&&)` (etc.) so that `std::move()` is efficient.
2. Prep: Refactor MDNode to clean up current allocation a bit, and to front-load NFC changes to keep (3) as clean as possible. (See `operator delete` comment for details.)
3. The heart of this patch: make MDTuple resizable with a `resize()` API.
4. Optional: Add the `append()` APIs (not sure they're needed but seem nice to have).
5. Follow-up: improve module append flags, using (3) and/or (4).

For (3), the heart of this patch, a couple of specific comments to call out:

- I think the API should be `protected` in MDNode and only exposed as `public` in `MDTuple` (for now; matching the assertion).
- Might be cleaner to offload growth/capacity logic to SmallVector or similar (when we're in hung-off territory); see `MDNode::reserve()` for details.

> In order to distinguish the 2 tiny node types we use the concept 'tiny resizable' and 'tiny fixed' MDNodes (in addition to large and small). Note that this extra complexity could be eliminated if we're OK with allocating extra space even for uniqued tiny nodes. Only 2-3% of MDNodes seem to have 0 operands, so maybe this would not be a big deal.

I think it's okay to allocate some extra space. But I think we can factor three modes ("fixed", "dynamic:small", and "dynamic:large") without too much complexity. "fixed" and "dynamic:small" can have identical layout; just one is allowed to resize and the other isn't.

> I named the resize method 'reserve' because we don't support shrinking anything.

That seems like an arbitrary restriction. Why not shrink?

I feel like it would be simpler to *only* have `resize()` (not reserve or append, although those could be built on top I suppose).

- If resize grows, it adds `nullptr` arguments. The `MDNodeTest.Print` unit test confirms my memory that `nullptr` is a valid operand value.
- If resize shrinks, it drops the refs. The allocation never shrinks, but I don't see why we'd arbitrarily restrict it not to be allowed.

================
Comment at: llvm/include/llvm/IR/Metadata.h:950
+    unsigned NumOperands;
+    MDOperand Operands[1];
+  };
----------------
I think(?) it's technically UB (or at least an extension) to use `[1]` for an array that's bigger. I suggest leaving this out, but adding an acccessor that does the math:
```
lang=c++
MutableArrayRef<MDOperand> getOperands() {
  return makeMutableArrayRef(reinterpret_cast<MDOperand *>(this + 1), NumOperands);
}
ArrayRef<MDOperand> getOperands() const {
  return const_cast<HungOffOperandStorage *>(this)->getOperands();
}
```
and you can add a static assert that the alignment will be okay (if it fails some day we can add `alignas` noise).

================
Comment at: llvm/include/llvm/IR/Metadata.h:953-960
+  unsigned Info; // Storage for the following bitfields.
+
+  using AllocType =
+      Bitfield::Element<AllocationType, 0, 2, AllocationType::Large>;
+  /// The capacity of an MDNode with co-allocated operands.
+  using CoallocCapacity = Bitfield::Element<unsigned, AllocType::NextBit, 4>;
+  using CoallocNumOperands =
----------------
It's obvious to me what the layout is. Can you add a comment above Info that spells it out clearly? Bitfield syntax (`// unsigned Name : Size;`) would be clear I think. (I'm not totally convinced the abstraction improves readability or reduces bugs but I'm not against it if you think it helps.)

================
Comment at: llvm/include/llvm/IR/Metadata.h:969-972
+  static constexpr unsigned CoCapSize = CoallocCapacity::Bits;
+  /// The maximum number of MDOperands that can be coallocated with an MDNode.
+  static constexpr unsigned MaxCoallocCapacity =
+      bitfields_details::BitPatterns<CoCapType, CoCapSize>::Umax;
----------------
Usually we do sizes as `size_t`, except when we're optimizing storage. There's no storage here though.

================
Comment at: llvm/include/llvm/IR/Metadata.h:1032-1060
+  AllocationType getAllocationType() const {
+    return Bitfield::get<AllocType>(Info);
+  }
+
+  void setAllocationType(AllocationType Type) {
+    Bitfield::set<AllocType>(Info, Type);
+  }
----------------
These all feel very `private` to me, not `protected`. (Also, I'm really not sure all this boilerplate is better than a normal bitfield.)

================
Comment at: llvm/include/llvm/IR/Metadata.h:1040-1042
+  unsigned getCoallocCapacity() const {
+    return Bitfield::get<CoallocCapacity>(Info);
+  }
----------------
Else where in LLVM coallocated storage is usually called "small". I suggest using the word "small" instead of "coalloc".

================
Comment at: llvm/include/llvm/IR/Metadata.h:1129-1131
+  /// Increase the operand capacity to \c NumOps.
+  /// \pre Must be either a temporary or distinct node.
+  void reserve(unsigned NumOps);
----------------
I think this should be protected and only enabled on `MDTuple` for now. Callers can do a `cast<MDTuple>()`... although, with the `append()` API, is there even a reason to need this? I don't see any uses except in tests, and the tests could just call `append()`.

================
Comment at: llvm/include/llvm/IR/Metadata.h:1294-1300
+  void append(ArrayRef<Metadata *> Ops);
+
+  void append(ArrayRef<MDOperand> Ops) {
+    append(makeArrayRef(
+        reinterpret_cast<Metadata **>(const_cast<MDOperand *>(Ops.begin())),
+        Ops.size()));
+  }
----------------
I think `append` should be protected and only exposed in `MDTuple`, for now. Most of the node kinds have a fixed operand count and calling `append()` would corrupt them.

================
Comment at: llvm/include/llvm/IR/Metadata.h:1389
+  static MDTuple *getDistinct(LLVMContext &Context, ArrayRef<Metadata *> MDs,
+                              unsigned Capacity = 0) {
+    return getImpl(Context, MDs, Distinct, Capacity);
----------------
Is an explicit capacity really useful/needed? I don't see any uses of this parameter except tests. It might be useful in tests to *access* the capacity (to confirm growth doubles or whatever), but if tests want to influence the capacity seems like they could just use `append()`.

================
Comment at: llvm/lib/IR/Metadata.cpp:569
   MDNode *N = static_cast<MDNode *>(Mem);
-  size_t OpSize = N->NumOperands * sizeof(MDOperand);
-  OpSize = alignTo(OpSize, alignof(uint64_t));
+  MDOperand *O = N->mutable_begin() + N->getCapacity();

----------------
It's a bit sketchy calling anything on `N` during delete (hence the msan suppression)... it seems even dodgier to call a member function that we can't even see.

Maybe as a prep commit we can refine MDNode's layout to fix the msan issue by moving the first 64-bits ahead of the `MDNode`. In MDNode, something like:
```
lang=c++
class MDNode {
  struct Header {
    unsigned NumOperands;
    unsigned NumUnresolved = 0;

    static constexpr size_t getAllocSize(unsigned NumOps) {
      return sizeof(MDOperand) * NumOps + sizeof(Header);
    }
    void *getAlloc() {
      return reinterpret_cast<char *>(this + 1) - getAllocSize(NumOperands);
    }

    MutableArrayRef<MDOperand> operands();
    ArrayRef<MDOperand> operands() const;
    explicit Header(unsigned NumOperands);
    ~Header();
  };
  Header &getHeader() {
    return *(reinterpret_cast<Header *>(this) - 1);
  }
  const Header &getHeader() const {
    return *(reinterpret_cast<const Header *>(this) - 1);
  }
```
Header is will be coallocated immediately before `MDNode`, and contain everything necessary to understand its layout.

When allocating:
```
lang=c++
  void *Mem = malloc(Header::getAllocSize(NumOps) + Size);
  Header &H = new(reinterpret_cast<char *>(Mem) + OpsAlloc) Header(NumOps);
  return reinterpret_cast<void *>(&H + 1);
```

Here, when deleting:
```
lang=c++
void MDNode::operator delete(void *Mem) {
  Header *H = reinterpret_cast<Header *>(Mem) - 1;
  Mem = H->getAlloc();
  H->~Header(); // destroys operands
  ::operator delete(Mem);
}
```

This prep commit would be almost NFC (no functionality change) -- except for fixing the MSan thing and dropping the suppression -- and you could make other refactorings in it to remove NFC noise from this patch, making this one easier to review.

For example, you can structure `Header` helper functions to be similar to what this commit will land with `HungOffOperandStorage`; maybe you can share the implementations... or maybe not, but it could simplify the code a bit if you did.

================
Comment at: llvm/lib/IR/Metadata.cpp:571-572

-  MDOperand *O = static_cast<MDOperand *>(Mem);
-  for (MDOperand *E = O - N->NumOperands; O != E; --O)
+  for (MDOperand *E = O - N->getCapacity(); O != E; --O)
     (O - 1)->~MDOperand();
+
----------------
This logic doesn't look quite right to me. If there's a capacity of 2 and only 1 op, then I think we'll have this in memory:
```
struct {
  MDOperand Op0;
  char Op1[sizeof(MDOperand)] Op1;
  MDNode Node;
};
```
I think you need to skip over `Op1` to avoid destroying something that hasn't been constructed, and I'm not seeing how that happens...

But if you make the change I suggested above in a prep commit, using `Header::ops_begin()` and `Header::ops_end()`, then you won't even have to modify this loop here.

================
Comment at: llvm/lib/IR/Metadata.cpp:587-589
+  assert(
+      getMetadataID() == MDTupleKind &&
+      "Increasing the number of operands is not supported for this node kind");
----------------
Ah, just spotted this late. Glad to see it, but seems like the related APIs should *only* show up in the public interface of `MDTuple`.

================
Comment at: llvm/lib/IR/Metadata.cpp:596-597
+  size_t OldOpSize = getNumOperands() * sizeof(MDOperand);
+  size_t HungOffSize =
+      sizeof(HungOffOperandStorage) + (NumOps - 1) * sizeof(MDOperand);
+
----------------
(I was a bit confused by `NumOps - 1` here at first, but then I looked at HungOffOperandStorage and figured it out. I suggested a refactoring there that would make this just `NumOps`... a bit simpler I think.)

I'm not seeing the exponential resizing that I think we want. I think we want `std::max(NumOps, OldOpSize * 2)` or something like that, in order to grow. (Maybe I missed it and it's elsewhere? See, for example, `SmallVectorBase::grow()`.)

There should also be an assertion for overflow if we go over `uint32_t`.

I'm wondering if it'd be better to defer most of the "growth" logic (except the first grow) to a battle-hardened vector class (for now, SmallVector). Here's the layout I'm thinking of:
```
lang=c++
// Pseudo-code, using template parameters for runtime things...
template <class Size> struct MDOperandCharArray {
  alignas(alignof(MDOperand)) char Buffer[sizeof(MDOperand) * Size];
};
template <size_t InitNumOps, bool IsDynamic> struct Layout {
  using LargeStorageVector = SmallVector<MDOperand, 0>;

  constexpr size_t NumOpsFitInVector = sizeof(LargeStorageVector) / sizeof(MDOperand);
  static_assert(NumOpsFitInVector * sizeof(MDOperand) == sizeof(LargeStorageVector));

  constexpr size_t MaxSmallSize = 15;
  constexpr bool InitLarge = InitNumOps > MaxSmallSize;
  constexpr bool NeedsLarge = InitLarge || IsDynamic;
  constexpr size_t MinSmallSize = NeedsLarge ? NumOpsFitInVector : 0;

  size_t SmallSize = MinSmallSize <= Size && Size <= MaxSmallSize ? InitNumOps : MinSmallSize;
  MDOperandCharArray<SmallSize> Ops;
  Header H;
  MDNode N;
  MDNodeSubclassTail Tail;
};
```
For dynamic nodes that start with 0 or 1 ops, `Layout` is bigger than what you have (for 64-bit, minimum small size of 2). But otherwise I think it's similar.

(Long-term, could add a (generic) `TinyVector` to ADT that is pointer-sized, and it could be used here to shrink the local allocation.)

Growing/shrinking would look like:
```
lang=c++
void Header::resize(size_t NumOps) {
  assert(IsDynamic);
  if (operands().size() == NumOps)
    return;

  if (!isSmall())
    getLarge().resize(NumOps);
  else if (NumOps <= SmallSize)
    resizeSmall(NumOps);
  else
    resizeSmallToLarge(NumOps);
}
void Header::resizeSmallToLarge(size_t NumOps) {
  assert(isSmall());
  assert(NumOps > SmallSize);
  LargeStorageVector NewOps(NumOps);
  llvm::move(operands(), NewOps.begin());
  resizeSmall(0);
  new (getLargePtr()) LargeStorageVector(std::move(NewOps));
  setToLarge(); // Sets the bit in Header that says large storage.
}
void Header::resizeSmall(size_t NumOps) {
  MutableArrayRef<MDOperand> ExistingOps = operands();
  assert(isSmall());
  assert(NumOps <= SmallSize);
  assert(NumOps != ExistingOps.size());
  int NumNew = (int)NumOps - (int)ExistingOps.size();
  MDOperand *O = ExistingOps.end();
  if (NumNew > 0) {
    for (int I = 0, E = NumNew; I != E; ++I)
      new (O++) MDOperand();
  } else {
    for (int I = 0, E = NumNew; I != E; --I)
      (--O)->~MDOperand();
  }
  setSmallNumOps(NumOps); // Records NumOps in Header.
  assert(O == ExistingOps.end());
}
void *Header::getLargePtr() {
  // Store it close by Header so it's more likely to be in the same cache line.
  static_assert(alignof(LargeStorageVector) <= alignof(Header));
  return reinterpret_cast<char *>(this) - sizeof(LargeStorageVector);
}
void *Header::getSmallPtr() {
  static_assert(alignof(MDOperand) <= alignof(Header));
  return reinterpret_cast<char *>(this) - sizeof(MDOperand) * SmallSize;
}
LargeStorageVector &getLarge() {
  assert(!isSmall());
  return *reinterpret_cast<LargeStorageVector *>(getLargePtr());
}
MutableArrayRef<MDOperand> Header::operands() {
  if (!isSmall())
    return getLarge();
  return makeMutableArrayRef(reinterpret_cast<MDOperand *>(getSmallPtr()), SmallNumOps);
}
```

================
Comment at: llvm/lib/IR/Metadata.cpp:607-610
+  // FIXME: Should we define MDOperand's move constructor to do this?
+  std::memcpy(NewOps, OldOps, OldOpSize);
+  for (MDOperand *OO = OldOps, *NO = NewOps; OO != op_end(); OO++, NO++)
+    MetadataTracking::retrack(OO, **OO, NO);
----------------
What does the move constructor do? Can we fix it in a prep commit?

================
Comment at: llvm/lib/IR/Metadata.cpp:953

-  return storeImpl(new (MDs.size()) MDTuple(Context, Storage, Hash, MDs),
+  size_t Size = std::max(MDs.size(), static_cast<size_t>(Capacity));
+  return storeImpl(new (Size, Storage) MDTuple(Context, Storage, Hash, MDs),
----------------
I think we usually use a C-style cast for integer casts (i.e., `(size_t)Capacity`). I don't feel strongly though if you prefer this; maybe check to see if it's documented in the LLVM coding style guide?

================
Comment at: llvm/lib/Linker/IRMover.cpp:1413-1422
+    auto ensureDistinctOp = [&](MDNode *DstValue) {
+      if (DstValue->isDistinct())
+        return DstValue;
+      MDNode *New = MDNode::replaceWithDistinct(DstValue->clone());
+      Metadata *FlagOps[] = {DstOp->getOperand(0), ID, New};
+      MDNode *Flag = MDNode::get(DstM.getContext(), FlagOps);
+      DstModFlags->setOperand(DstIndex, Flag);
----------------
This is simpler if you can assume `MDTuple`, since you don't need to clone and replace. You can just make a new one without worrying about other fields.

(IIRC, `clone()` has some overhead by being temporary to start...)

================
Comment at: llvm/lib/Linker/IRMover.cpp:1418
+      Metadata *FlagOps[] = {DstOp->getOperand(0), ID, New};
+      MDNode *Flag = MDNode::get(DstM.getContext(), FlagOps);
+      DstModFlags->setOperand(DstIndex, Flag);
----------------
Not much point in the flag being uniqued if it's pointing at something that isn't uniqued. I wonder if this should be `MDTuple::getDistinct()` (leave it as-is if it's already the right kind).

================
Comment at: llvm/lib/Linker/IRMover.cpp:1444-1462
     case Module::Append: {
-      MDNode *DstValue = cast<MDNode>(DstOp->getOperand(2));
+      MDNode *DstValue = ensureDistinctOp(cast<MDNode>(DstOp->getOperand(2)));
       MDNode *SrcValue = cast<MDNode>(SrcOp->getOperand(2));
-      SmallVector<Metadata *, 8> MDs;
-      MDs.reserve(DstValue->getNumOperands() + SrcValue->getNumOperands());
-      MDs.append(DstValue->op_begin(), DstValue->op_end());
-      MDs.append(SrcValue->op_begin(), SrcValue->op_end());
-
-      replaceDstValue(MDNode::get(DstM.getContext(), MDs));
+      DstValue->append(makeArrayRef(SrcValue->op_begin(), SrcValue->op_end()));
       break;
     }
     case Module::AppendUnique: {
----------------
Can we delay this usage to a follow-up patch? I'd rather add the IR feature first (since it's non-trivial) and then add the adoption.

It is useful to see it at the same time to understand usage. Phabricator has a feature to link two reviews together (edit related revisions -> child/parent).

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124548/new/

https://reviews.llvm.org/D124548