[Mlir-commits] [mlir] [mlir][RFC] Bytecode: op fallback path (PR #129784)

Wed Mar 5 11:14:24 PST 2025

nikalra wrote:

> Overall the idea is interesting. Is there an RFC on discourse? Some more questions/comments below.

I'll create one: https://discourse.llvm.org/t/rfc-bytecode-op-fallback-path/84993

> 
> > Instead, the arbitrary receiving process should be able to parse the module in a way that it can recognize whichever ops it _does_ support, perform downstream work on those ops, and faithfully encode the ops it did not recognize back into bytecode while still maintaining the original structure of the module.
> 
> Isn't this somewhat equivalent to parsing unregistered/unverifiable ops? It seems you are basically trying to read and partition the ops that are registered/verified by the dialect.
> 

Yes, but that the ops are registered at serialization time. The problem with using unregistered ops directly is that they encode properties as attributes, which breaks down for versioned ops that utilize custom properties encoding to support reading/writing older versions.

There's another option here which involves converting newer ops to the fallback op pre-serialization. It would still require determining the dialect version used by the receiving process, but would turn this into a serialization problem vs a deserialization problem.

> How the fallback would extend to types/attributes for the same versioned dialect?

Types/attributes are a little easier because they can be treated as OpaqueType/OpaqueAttr as long as the dialect has an encoding scheme that supports round-tripping between them.

> 
> > If a partition is made in the middle of the decomposition, it's impossible to recreate the original semantics even on a version of the dialect that does provide STFT.
> 
> Wouldn't this indicate a pass ordering problem, rather than a serialization problem?

True, unless bytecode is used as a transfer medium inside of a pass. In that case, changing the semantics of the module would end up changing the results of the pass. The alternative to using bytecode would be to create a sideband encoding scheme for operations and constants to be communicated to an external system, but that would effectively duplicate the capabilities provided by bytecode today.

> 
> > This RFC enables processes in this setup to avoid that overhead since everything is faithfully represented the same way in bytecode.
> 
> Have you considered using lazy loading + multiple serializations of the same top level func for different targets?

Yeah, but it ends up getting pretty complicated: the module may have many top-level funcs, and reconciling the different versions together at the end is non-trivial due to the number of permutations that exist.

https://github.com/llvm/llvm-project/pull/129784