[PATCH] D124548: [Metadata] Add a resize/reserve capability for MDNodes

Tue May 3 11:51:22 PDT 2022

dexonsmith added inline comments.

================
Comment at: llvm/include/llvm/IR/Metadata.h:953-960
+  unsigned Info; // Storage for the following bitfields.
+
+  using AllocType =
+      Bitfield::Element<AllocationType, 0, 2, AllocationType::Large>;
+  /// The capacity of an MDNode with co-allocated operands.
+  using CoallocCapacity = Bitfield::Element<unsigned, AllocType::NextBit, 4>;
+  using CoallocNumOperands =
----------------
wolfgangp wrote:
> dexonsmith wrote:
> > It's obvious to me what the layout is. Can you add a comment above Info that spells it out clearly? Bitfield syntax (`// unsigned Name : Size;`) would be clear I think. (I'm not totally convinced the abstraction improves readability or reduces bugs but I'm not against it if you think it helps.)
> I'm not sure it helps either. I'm always a bit uncertain about how much the availability of abstractions constitutes an encouragement to use them.  In this case, I guess it makes the derivation of properties like max values from bitfields a bit easier, but it's not that hard to get it right anyway. Otherwise it feels like abstraction for abstractions sake. I'd be happy to get rid of it if you agree.
SGTM!

================
Comment at: llvm/include/llvm/IR/Metadata.h:1389
+  static MDTuple *getDistinct(LLVMContext &Context, ArrayRef<Metadata *> MDs,
+                              unsigned Capacity = 0) {
+    return getImpl(Context, MDs, Distinct, Capacity);
----------------
wolfgangp wrote:
> dexonsmith wrote:
> > Is an explicit capacity really useful/needed? I don't see any uses of this parameter except tests. It might be useful in tests to *access* the capacity (to confirm growth doubles or whatever), but if tests want to influence the capacity seems like they could just use `append()`.
> I had envisioned a slightly different usage model in that clients would figure out in advance how many operands they need and reserve the space first. With the automatic expansion done by resize() I was afraid of too much waste, given how much space large LTO links are already using, for example.
With automatic expansion it'll be a strict improvement in memory usage over what it was before (previously, all previous sizes would still be around) and it changes a quadratic operation to amortized linear (calling `Metadata::retrack()` isn't cheap). I think it's a better starting point.

If a profile shows that the automatically-expanded nodes are too expensive in practice, could add a `shrink_to_fit()` later.

================
Comment at: llvm/lib/IR/Metadata.cpp:571-572

-  MDOperand *O = static_cast<MDOperand *>(Mem);
-  for (MDOperand *E = O - N->NumOperands; O != E; --O)
+  for (MDOperand *E = O - N->getCapacity(); O != E; --O)
     (O - 1)->~MDOperand();
+
----------------
wolfgangp wrote:
> dexonsmith wrote:
> > This logic doesn't look quite right to me. If there's a capacity of 2 and only 1 op, then I think we'll have this in memory:
> > ```
> > struct {
> >   MDOperand Op0;
> >   char Op1[sizeof(MDOperand)] Op1;
> >   MDNode Node;
> > };
> > ```
> > I think you need to skip over `Op1` to avoid destroying something that hasn't been constructed, and I'm not seeing how that happens...
> > 
> > But if you make the change I suggested above in a prep commit, using `Header::ops_begin()` and `Header::ops_end()`, then you won't even have to modify this loop here.
> Hmm, all the allocated operands should have been constructed in operator new. In this scheme the idea was that operator new allocates the capacity and the constructor assigns the actual operands, of which there may be fewer than the capacity. Either way, all allocated operand slots should be properly constructed.
> 
> In any case, if we don't separate capacity and number of actual assigned operands, this is academic.
Ah, as soon as I saw the word "capacity" I assumed there would be storage where the operands had not been constructed (similar to a vector). I do think automatic expansion is the right default for resizable storage.

================
Comment at: llvm/lib/IR/Metadata.cpp:607-610
+  // FIXME: Should we define MDOperand's move constructor to do this?
+  std::memcpy(NewOps, OldOps, OldOpSize);
+  for (MDOperand *OO = OldOps, *NO = NewOps; OO != op_end(); OO++, NO++)
+    MetadataTracking::retrack(OO, **OO, NO);
----------------
wolfgangp wrote:
> dexonsmith wrote:
> > What does the move constructor do? Can we fix it in a prep commit?
> The move constructor copies the operand to the newly allocated slot and does retrack(). Optionally, it could also null out the old location (or call the destructor on it).
Not obvious to me how this is different, but I think if there's an improvement to make in the move constructor we should fix it, and this (or `SmallVector<MDOperand>`) should just use it.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124548/new/

https://reviews.llvm.org/D124548