[llvm] [IR][Float8] Add two kinds float8 IR type (PR #89900)

Joshua Cranmer via llvm-commits llvm-commits at lists.llvm.org
Wed Apr 24 11:01:11 PDT 2024


================
@@ -3871,9 +3879,9 @@ Floating-Point Types
    * - ``ppc_fp128``
      - 128-bit floating-point value (two 64-bits)
 
-The binary format of half, float, double, and fp128 correspond to the
-IEEE-754-2008 specifications for binary16, binary32, binary64, and binary128
-respectively.
+The binary format of float8e5m2, half, float, double, and fp128 correspond
+to the IEEE-754-2008 specifications for binary8, binary16, binary32, binary64,
----------------
jcranmer-intel wrote:

IEEE 754-2008 (nor IEEE 754-2019, for that matter) doesn't define a binary8 type. And the table for binaryk actually applies to k >= 128 and k % 32 == 0. And even if you were to ignore that restriction, the formulas you would get for k =8 gives a p of 9 bits and a w of -1 bits.

https://github.com/llvm/llvm-project/pull/89900


More information about the llvm-commits mailing list