[Mlir-commits] [mlir] Add arith expansion of f8E8M0 type for extf/trunc ops (PR #140332)
Umang Yadav
llvmlistbot at llvm.org
Tue May 20 12:33:28 PDT 2025
umangyadav wrote:
> As discussed on the IREE issue, I believe that there is a longstanding tradition that if a conversion returns a finite value then it should be the nearest representable value, that is corroborated by table 3 in the OCP spec discussing the "overflow or saturate" semantics, that is broken by taking the absolute value.
Table 3 is for FP8 types except F8E8M0.
FP8E8M0 is used for shared block scale which is infact calculated by taking extracting exponent bits of `fabs(value.f32))`. So it is not really a "conversion" or "cast" in conventional sense.
OCP Spec has this definition for Fp8E8M0
"E8M0 is an unsigned representation of a conventional biased Float32 exponent"
Here is one of the reference:
https://github.com/amd/Quark/blob/60cd6e46d20a5553a7b1a754c0459737f3c31fde/quark/onnx/operators/custom_ops/src/mx/cuda/mx_kernel.cu#L63
https://github.com/llvm/llvm-project/pull/140332
More information about the Mlir-commits
mailing list