[clang] [llvm] [APFloat] Add support for f8E4M3 IEEE 754 type (PR #97179)

Alexander Pivovarov via cfe-commits cfe-commits at lists.llvm.org
Sun Jun 30 11:30:45 PDT 2024


================
@@ -136,6 +136,7 @@ static constexpr fltSemantics semIEEEquad = {16383, -16382, 113, 128};
 static constexpr fltSemantics semFloat8E5M2 = {15, -14, 3, 8};
 static constexpr fltSemantics semFloat8E5M2FNUZ = {
     15, -15, 3, 8, fltNonfiniteBehavior::NanOnly, fltNanEncoding::NegativeZero};
+static constexpr fltSemantics semFloat8E4M3 = {7, -6, 4, 8};
----------------
apivovarov wrote:

`f8E4M3` type follows IEEE 754 convention:
- Exponent bias: 7
- Maximum stored exponent value: 14 (binary 1110)
- Maximum unbiased exponent value: 14 - 7 = 7
- Minimum stored exponent value: 1 (binary 0001)
- Minimum unbiased exponent value: 1 − 7 = −6
- Precision specifies the total number of bits used for the significand (mantisa), including implicit leading integer bit = 3 + 1 = 4

`fltSemantics semFloat8E4M3 = {7, -6, 4, 8};`:
- maxExponent = 7
- minExponent = -6
- precision = 4
- sizeInBits = 8
- nonFiniteBehavior = fltNonfiniteBehavior::IEEE754
- nanEncoding = fltNanEncoding::IEEE

https://github.com/llvm/llvm-project/pull/97179


More information about the cfe-commits mailing list