[PATCH] D79671: [MsgPack] MsgPackDocument::readFromBlob now merges

Scott Linder via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed May 13 11:25:15 PDT 2020


scott.linder added a comment.

In D79671#2032032 <https://reviews.llvm.org/D79671#2032032>, @tpr wrote:

> In D79671#2030708 <https://reviews.llvm.org/D79671#2030708>, @scott.linder wrote:
>
> > Dinesh is working on merging metadata in ROCm CompilerSupport using MsgPackDocument, and it fell out that merging YAML metadata was also handled transparently. What is the advantage to doing this via a function pointer passed to `readFromBlob`? It doesn't seem to save any copies as you still need to be able to compare the nodes; is it just a matter of readability? Would it be reasonable to also add a callback for `fromYAML`?
>
>
> I too tried doing it by reading the second blob into a MsgPackDocument and merging it into the first one from there. But I think that does involve more copies/allocation/etc because a map in the second document would actually get created as a std::map before having its elements merged into the corresponding map in the first document.
>
> I guess the same thing as I have done could be done for the YAML reading code.


I see now, it seems like in the new version we still create the empty `std::map` and `std::array`, but as we identify them as conflicting immediately afterwards we never populate them. So we still have the same number of dead objects floating around, but we avoid populating them and then just copying out their contents, which saves us allocations and copies.

The other concern I have is that this may make it difficult to do semantic merges which need more context than the current node being merged. I think this is OK for HSA currently, because we just need to fail when we come across mismatching version nodes.



================
Comment at: llvm/include/llvm/BinaryFormat/MsgPackDocument.h:363
+  /// respectively, although it could be the array or map (respectively) that
+  /// was already there. MapKey is the key if *DestNode is a map entry, a nil
+  /// node otherwise. The default for Merger is to allow array and map merging,
----------------
`Nil` is a valid map key in MsgPack, so this is ambiguous; can we use an `Empty` entry, or just use an `Optional<DocNode>` where `None` represents the non-map case?


================
Comment at: llvm/include/llvm/BinaryFormat/MsgPackDocument.h:370-371
+          Merger = [](DocNode *DestNode, DocNode SrcNode, DocNode MapKey) {
+            return (DestNode->isMap() && SrcNode.isMap()) ||
+                   (DestNode->isArray() && SrcNode.isArray());
+          });
----------------
tpr wrote:
> foad wrote:
> > tpr wrote:
> > > foad wrote:
> > > > Can't the body of this default merger be just `llvm_unreachable();`?
> > > Probably, although that is a change in behavior from before, which I'd rather not do as I think I am not the only user of it. (Also, even if we did do that, it should be "return false" so that the readFromBlob call returns false for failure.)
> > I don't get it. Surely in existing use cases, DestNode is nil so the merger will not be called. And if it *was* called, there's no point in it returning true unless it actually copies some data from SrcNode to DestNode, otherwise it'll claim to have done a merge but actually it ignored the new incoming data and left the document exactly as it was (i.e. empty). What am I missing?
> It's a bit of a corner case, but, in the code before, I think you could have a map containing two entries of the same name, both maps, and it would merge them together. Having two entries of the same name is probably not legal MsgPack in some sense.
Isn't the behavior already changing? In the old code any existing document is discarded and you essentially parse a fresh document to replace it. In the new code the default is effectively inverted, and if you don't do anything special in your `Merger` you get the existing document with only distinct bits added in from the newly parsed one. If we are completely changing the semantics anyway it seems reasonable to fix odd bits like this.

Alternatively we could just have Merger be `Optional` and retain the behavior completely if it is not present.


================
Comment at: llvm/lib/BinaryFormat/MsgPackDocument.cpp:158
     }
+    if (!DestNode->isNil()) {
+      // In a merge, there is already a value at this position. Call the
----------------
Shouldn't this check for `Empty`? Why is `Nil` special concerning merges?

Is this because we represent an empty document as a `Nil` node? I didn't notice that in the original review, but I think we need to represent an empty document in a way that is distinct from one of the possible documents (in this case we can't distinguish an empty document from a document containing a single `Nil` node).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79671/new/

https://reviews.llvm.org/D79671





More information about the llvm-commits mailing list