[Mlir-commits] [mlir] [mlir][gpu] Add pass for emulating unsupported types. (PR #138087)

Fri May 2 13:29:36 PDT 2025

mshahneo wrote:

> Thanks for taking the time for explaining your motivations, though I still fundamentally disagree with them.
> 
> The lowering to SPIR-V should replace all mentions of `bf16` with `i16`, and replace `arith.extf %x : bf16 to i32` and `arith.truncf %x : f32 to bf16` with operations that produce `i16`.
> 
> See, for example, the equivalent code in LLVMTypeConverter
> 
> ```c++
> Type LLVMTypeConverter::convertFloatType(FloatType type) const {
>   // Valid LLVM float types are used directly.
>   if (LLVM::isCompatibleType(type))
>     return type;
> 
>   // F4, F6, F8 types are converted to integer types with the same bit width.
>   if (isa<Float8E5M2Type, Float8E4M3Type, Float8E4M3FNType, Float8E5M2FNUZType,
>           Float8E4M3FNUZType, Float8E4M3B11FNUZType, Float8E3M4Type,
>           Float4E2M1FNType, Float6E2M3FNType, Float6E3M2FNType,
>           Float8E8M0FNUType>(type))
>     return IntegerType::get(&getContext(), type.getWidth());
> 
>   // Other floating-point types: A custom type conversion rule must be
>   // specified by the user.
>   return Type();
> }
> ```
> 
> This should correctly convert kernel arguments etc.
> 
> One other thing we may want to do here is to relax the verifier on `gpu.launch_func` to allow a bf16/i16 mismatch if it turns out that's an issue. I think that's a better solution than this thing

Thank you so much for your suggestion. I'll take a look.
I do understand this approach has some restriction especially with its reliance on view op. Although a memref.bitcast like op would remove all the restrictions.
Could you please elaborate, why you disagree with this approach?

Let me specify my reasoning for this pass approach:

- One of the main reasons, we chose this approach is due to the flexibility of a pass. Depending on the use case, you can either choose to or not to utilize this pipeline. Keeps the core support un-affected.
- Portability: being a pass at a GPU dialect-level, it can be used for any backend (LLVM/SPIR-V). Although LLVM already handles this scenario as you pointed out.
- Not changing fundamental of the existing lowering and verification. (e.g., as you mentioned a scenario gpu.launch_func would have to allow mismatch)
- Allow for arbitrary source and target types of same bitwidth. (Although I know that the most common scenario is unsupported floats to same bit ints)
- Makes way to keep the vendor-specific lowering (e.g., Intel-specific bf16 conversion instructions) patterns separate from the core lowering patterns.

Some of the above mentioned things can be achieved through your approach, but they make the core lowering logic to SPIR-V more convoluted.

- 

https://github.com/llvm/llvm-project/pull/138087