[Mlir-commits] [mlir] [mlir][gpu] Add pass for emulating unsupported types. (PR #138087)

Wed May 7 09:26:43 PDT 2025

mshahneo wrote:

> Skimmed through the change again, and I think my main concern of the linearization bits have been solved. Just my two cents if it helps resolve the deadlock here.
> 
> First of all, @mshahneo thanks for working through this and making a fairly flushed out implementation with tests. I fully believe this solves the problem you are encountering, and I am sure other folks have hit this problem as well and would be a good common resource.
> 
> I have a few follow up questions though. It is not immediately clear to me what is the scope of this change. For example,
> 
> 1. what would happen for the case of `i4` or any sub-byte type. How is this supposed to be handled? This falls within the scope of the sub-byte emulation support that exists in tree.
> 2. How would this work for dialects that are not in tree. It seems like for this pass to work it will need to support all operations within the function. But if someone is using this in a function mixing operations from dialects in core, and downstream dialects, then this pass would break. So in that sense this pass is very fixed function.
> 
> In general it would have probably been better to get some community discussion going with RFC. Having an implementation like this actually makes the RFC stronger, but is a better forum to discuss than a PR. Another aspect to consider is the maintainability of the pass. When this lands in main, someone needs to be the defacto owner of it (ideally one of the existing folks who think this is a valuable thing to add and take on the maintainence of this). I know creating RFCs is kind of a pain, but MLIR does suffer a lot with abandoned dialects/methods/transformations that just become a maintainence burden now.

Thank you so so much, @MaheshRavishankar :).

Let me jump to your question first:
> 1. what would happen for the case of `i4` or any sub-byte type. How is this supposed to be handled? This falls within the scope of the sub-byte emulation support that exists in tree.

It depends. Normally it would not affect the in-tree sub-byte emulation mechanism at all. But let's say a user/vendor have a use case where they want to emulate f4(f4E2M1FN) using i4. The pass would emulate it as i4. If they are used in any arith/math operations, they would have to be replaced by respective vendor-specific operation. Otherwiswe, it would be handled by the SPIR-V converter as a sub-byte integer. 
Does it answer your concern?

> 2. How would this work for dialects that are not in tree. It seems like for this pass to work it will need to support all operations within the function. But if someone is using this in a function mixing operations from dialects in core, and downstream dialects, then this pass would break. So in that sense this pass is very fixed function.

Currently, we handle any non-tree dialect ops using generic conversion logic, the pass only deals with operation that has a memref operand (primarily to deal with memory operation).

That being said, downstream dialects can have other ops with different logic. That's why we exposed the patterns through populateImitateUnsupportedTypesConversionPatterns(). So that the user can utilize it in their own downstream passes if they have special use cases. This is actually similar to other emulation passes.

Please let me know if this answers your concern.

Please let me know if you have any more concern.

>Another aspect to consider is the maintainability of the pass. When this lands in main, someone needs to be the defacto owner of it (ideally one of the existing folks who think this is a valuable thing to add and take on the maintainence of this).

Yes, that is a good point.
I can open an RFC if you want.

Again, thank you so so much.

https://github.com/llvm/llvm-project/pull/138087