[Mlir-commits] [mlir] [mlir][RFC] Bytecode: op fallback path (PR #129784)

Thu Mar 6 17:48:09 PST 2025

nikalra wrote:

> The op not being understood is one aspect, but you can't directly grab data from the bytecode and rely on that being able to roundtrip (the bytecode has references to different parts of itself that you won't be able to replicate and understand).

Agreed, but unless I'm misunderstanding what you're saying, I don't think this proposal does that. Specifically, the type and attribute sections (which I believe is what you're referring to) still need to get created at time of writing. Rather, this proposal just creates an additional path during parsing where the dialect has an opportunity to deserialize an op that cannot be deserialized using the default process.

> The easiest example here is what to do about new attributes/types that weren't present in an older dialect, if that attribute/type uses a custom encoding you can't take the bytecode for it directly and roundtrip it.

Agreed, but the BytcodeDialectInterface leaves attribute and type parsing up to the dialect to do in an implementation defined manner. That's unchanged with this proposal, and would be up to the dialect to handle that complexity.

> I also don't want to encorage trying to make expectations on the structure of bytecode or how things are referenced. Consider a new container attribute #foo.new_container<#foo.old_attr>, if foo.new_container uses a custom dialect representation (which is generally the encouraged/expected case), you won't know how to interpret its contents (to know that its referencing #foo.old_attr). There is no way that you will reasonably be able to roundtrip this without essentially keeping entire sections of the original bytecode alive.

I don't think I'm following this -- 

In the same way that the current bytecode interfaces allow a dialect to define ser/de for an Attribute and Type, this proposal adds a path for a dialect to define custom deserialization for an Op that cannot be parsed using the default mechanisms. While the proposed use case is for passing through new ops, the infrastructure still provides a fallback path for parsing existing registered ops that could not be deserialized. The dialect can opt-into this behavior but is not required to. The fallback path is also not required to succeed if the dialect cannot interpret the encoded contents.

Considering the new container attribute, it would be on the dialect to encode the container in a way that the nested attributes can be serialized/deserialized using a dialect-defined encoding scheme—in the same way as would be required for the fallback op to successfully deserialize an unknown op. 8d8ac5ff484058164a30ffdc90c6935107054a7e adds an example of what this could look like, but isn't meant to be prescriptive on what that encoding should look like (nor does it support all cases).

It might be worth mentioning that this proposal doesn't create a passthrough mechanism for bytecode to be round-tripped; it just creates a mechanism for the dialect to round-trip unknown ops if it needs to. The onus is still on the dialect to get that right, as it is for any of the custom dialect representations that are currently enabled for types/attributes/properties. In other words, the dialect bytecode interface already gives dialects the ability to replace one type or attribute with another during deserialization; this does the same for ops.

https://github.com/llvm/llvm-project/pull/129784