[Mlir-commits] [mlir] Extending UniformQuantizedType with interface-based support for new storage types in Quant dialect (PR #152966)

Mon Feb 9 04:25:35 PST 2026

================
@@ -159,6 +192,34 @@ def Builtin_Float8E4M3FN : Builtin_FloatType<"Float8E4M3FN", "f8E4M3FN"> {
 
     Described in: https://arxiv.org/abs/2209.05433
   }];
+
+  let extraClassDeclaration = [{
+    /// QuantStorageTypeInterface method implementations
+    /// Whether the storage type should default to signed when used in quantization.
+    bool shouldDefaultToSigned() const { return true; }
+    /// Get the bit width of this 8-bit floating point type.
+    unsigned getStorageWidth() const { return 8; }
+    
+    /// Get default maximum value for this 8-bit floating point type.
+    int64_t getDefaultMaximum(bool isSigned) const { return 448; }
+    /// Get default minimum value for this 8-bit floating point type.
+    int64_t getDefaultMinimum(bool isSigned) const { return -getDefaultMaximum(isSigned); }
+    
+    /// Get the storage type as a string.
+    std::string getStorageTypeName(bool isSigned) const { return "f8E4M3FN"; }
+
+    /// Check if this 8-bit floating point type uses packed representation.
+    bool isPacked() const { return false; }
+
+    /// Get the logical bit width per value for this 8-bit floating point type.
+    unsigned getLogicalBitWidth() const { return 8; }
+
+    /// Get the number of logical elements that fit in one byte for this 8-bit floating point type.
----------------
javedabsar1 wrote:

Maybe clear if “Returns many values of this type can fit in a bye, e.g. 1 for ...

https://github.com/llvm/llvm-project/pull/152966