[Mlir-commits] [mlir] [AMD][ROCDL][AMDGPU] Support packed conversions fp8/bf8->bf16 and fp8/bf8->fp32 (PR #131850)

Wed Mar 19 11:44:45 PDT 2025

yiqian1 wrote:

> @yiqian1 Please add guards to prevent using the bf16 instructions on gfx942
> 
> That is,
> 
> ```
> %y = arith.extf %x : f8E4M3FNUZ to bf16
> ```
> 
> on gfx942 needs to go via `f32`

An arith.extf to any non-f32 types will go via f32, as shown in these test cases
```
%w = arith.extf %v : f8E5M2FNUZ to f16
```
and
```
%w = arith.extf %v : vector<2xf8E5M2FNUZ> to vector<2xf64>
```

`->bf16` is similar.

Note that currently fp8/bf8->bf16 conversions are not used in lowering amdgpu.ext_packed_fp8. We can add them for gfx950 in the future if we want. In this PR, I just updated amdgpu.ext_packed_fp8 with packed fp8/bf8->fp32 conversions, which are  available on both gfx942 and gfx950.

https://github.com/llvm/llvm-project/pull/131850