[llvm] [APFloat] Add APFloat support for E8M0 type (PR #107127)

Durgadoss R via llvm-commits llvm-commits at lists.llvm.org
Thu Sep 5 11:58:05 PDT 2024


================
@@ -195,6 +195,12 @@ struct APFloatBase {
     // improved range compared to half (16-bit) formats, at (potentially)
     // greater throughput than single precision (32-bit) formats.
     S_FloatTF32,
+    // 8-bit floating point number with (all the) 8 bits for the exponent
+    // like in FP32. There are no zeroes, no infinities, and no denormal values.
+    // NaN is represented with all bits set to 1. Bias is 127.
+    // This represents the scale data type in the MX specification from
+    // https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf
+    S_Float8E8M0FN,
----------------
durga4github wrote:

oh, the FN means "Finite". This is consistent with what's being used for other similar types here.

https://github.com/llvm/llvm-project/pull/107127


More information about the llvm-commits mailing list