[PATCH] D124548: [Metadata] Add a resize/reserve capability for MDNodes

Tue May 3 11:15:46 PDT 2022

wolfgangp added inline comments.

================
Comment at: llvm/include/llvm/IR/Metadata.h:953-960
+  unsigned Info; // Storage for the following bitfields.
+
+  using AllocType =
+      Bitfield::Element<AllocationType, 0, 2, AllocationType::Large>;
+  /// The capacity of an MDNode with co-allocated operands.
+  using CoallocCapacity = Bitfield::Element<unsigned, AllocType::NextBit, 4>;
+  using CoallocNumOperands =
----------------
dexonsmith wrote:
> It's obvious to me what the layout is. Can you add a comment above Info that spells it out clearly? Bitfield syntax (`// unsigned Name : Size;`) would be clear I think. (I'm not totally convinced the abstraction improves readability or reduces bugs but I'm not against it if you think it helps.)
I'm not sure it helps either. I'm always a bit uncertain about how much the availability of abstractions constitutes an encouragement to use them.  In this case, I guess it makes the derivation of properties like max values from bitfields a bit easier, but it's not that hard to get it right anyway. Otherwise it feels like abstraction for abstractions sake. I'd be happy to get rid of it if you agree.

================
Comment at: llvm/include/llvm/IR/Metadata.h:1389
+  static MDTuple *getDistinct(LLVMContext &Context, ArrayRef<Metadata *> MDs,
+                              unsigned Capacity = 0) {
+    return getImpl(Context, MDs, Distinct, Capacity);
----------------
dexonsmith wrote:
> Is an explicit capacity really useful/needed? I don't see any uses of this parameter except tests. It might be useful in tests to *access* the capacity (to confirm growth doubles or whatever), but if tests want to influence the capacity seems like they could just use `append()`.
I had envisioned a slightly different usage model in that clients would figure out in advance how many operands they need and reserve the space first. With the automatic expansion done by resize() I was afraid of too much waste, given how much space large LTO links are already using, for example.

================
Comment at: llvm/lib/IR/Metadata.cpp:571-572

-  MDOperand *O = static_cast<MDOperand *>(Mem);
-  for (MDOperand *E = O - N->NumOperands; O != E; --O)
+  for (MDOperand *E = O - N->getCapacity(); O != E; --O)
     (O - 1)->~MDOperand();
+
----------------
dexonsmith wrote:
> This logic doesn't look quite right to me. If there's a capacity of 2 and only 1 op, then I think we'll have this in memory:
> ```
> struct {
>   MDOperand Op0;
>   char Op1[sizeof(MDOperand)] Op1;
>   MDNode Node;
> };
> ```
> I think you need to skip over `Op1` to avoid destroying something that hasn't been constructed, and I'm not seeing how that happens...
> 
> But if you make the change I suggested above in a prep commit, using `Header::ops_begin()` and `Header::ops_end()`, then you won't even have to modify this loop here.
Hmm, all the allocated operands should have been constructed in operator new. In this scheme the idea was that operator new allocates the capacity and the constructor assigns the actual operands, of which there may be fewer than the capacity. Either way, all allocated operand slots should be properly constructed.

In any case, if we don't separate capacity and number of actual assigned operands, this is academic.

================
Comment at: llvm/lib/IR/Metadata.cpp:607-610
+  // FIXME: Should we define MDOperand's move constructor to do this?
+  std::memcpy(NewOps, OldOps, OldOpSize);
+  for (MDOperand *OO = OldOps, *NO = NewOps; OO != op_end(); OO++, NO++)
+    MetadataTracking::retrack(OO, **OO, NO);
----------------
dexonsmith wrote:
> What does the move constructor do? Can we fix it in a prep commit?
The move constructor copies the operand to the newly allocated slot and does retrack(). Optionally, it could also null out the old location (or call the destructor on it).

================
Comment at: llvm/lib/Linker/IRMover.cpp:1444-1462
     case Module::Append: {
-      MDNode *DstValue = cast<MDNode>(DstOp->getOperand(2));
+      MDNode *DstValue = ensureDistinctOp(cast<MDNode>(DstOp->getOperand(2)));
       MDNode *SrcValue = cast<MDNode>(SrcOp->getOperand(2));
-      SmallVector<Metadata *, 8> MDs;
-      MDs.reserve(DstValue->getNumOperands() + SrcValue->getNumOperands());
-      MDs.append(DstValue->op_begin(), DstValue->op_end());
-      MDs.append(SrcValue->op_begin(), SrcValue->op_end());
-
-      replaceDstValue(MDNode::get(DstM.getContext(), MDs));
+      DstValue->append(makeArrayRef(SrcValue->op_begin(), SrcValue->op_end()));
       break;
     }
     case Module::AppendUnique: {
----------------
dexonsmith wrote:
> Can we delay this usage to a follow-up patch? I'd rather add the IR feature first (since it's non-trivial) and then add the adoption.
> 
> It is useful to see it at the same time to understand usage. Phabricator has a feature to link two reviews together (edit related revisions -> child/parent).
Yeah, I'll follow the sequence you suggested up at the top.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124548/new/

https://reviews.llvm.org/D124548