[Mlir-commits] [mlir] [MLIR] Add f8E8M0FNU type (PR #111028)

Thu Oct 3 11:11:06 PDT 2024

sergey-kozub wrote:

> Fly on the wall, but at what point, upon removing all features that traditionally make something a "floating point number" (mantissa, zero, denorms, infinities) does something no longer make any sense at all being part of a floating point hierarchy. It's just a bit-vector with a special error value.

I also wondered what makes it a floating point number.
>From my perspective, both int and floating point types represent values in a numeric range. The difference is that for int types, the distance between adjacent numbers is constant, and for floating point types it's variable.

For E8M0, mantissa is there but is implicit (has 1 bit which has value of one) - other FP types also have an implicit bit of data.
Unsigned ints also can't represent negative numbers (same with E8M0). Not having infinities is also common, e.g. for other FP8 types like E4M3FN and E5M2FNUZ.

The E8M0 is intended to be used as a scaling factor in block scaled formats like MXFP8, which is exactly why it doesn't have negatives, infinities or zeros - none of these makes sense for a scaling factor.

https://github.com/llvm/llvm-project/pull/111028