[Mlir-commits] [mlir] Sub-channel quantized type implementation (PR #120172)

Wed Mar 12 17:58:11 PDT 2025

================
@@ -410,6 +410,123 @@ int32_t UniformQuantizedPerAxisType::getQuantizedDimension() const {
   return getImpl()->quantizedDimension;
 }
 
+UniformQuantizedSubChannelType UniformQuantizedSubChannelType::get(
+    unsigned flags, Type storageType, Type expressedType,
+    DenseElementsAttr scales, DenseElementsAttr zeroPoints,
+    ArrayRef<int32_t> quantizedDimensions, ArrayRef<int64_t> blockSizes,
+    int64_t storageTypeMin, int64_t storageTypeMax) {
+  return Base::get(storageType.getContext(), flags, storageType, expressedType,
+                   scales, zeroPoints, quantizedDimensions, blockSizes,
+                   storageTypeMin, storageTypeMax);
+}
+
+UniformQuantizedSubChannelType UniformQuantizedSubChannelType::getChecked(
+    function_ref<InFlightDiagnostic()> emitError, unsigned flags,
+    Type storageType, Type expressedType, DenseElementsAttr scales,
+    DenseElementsAttr zeroPoints, ArrayRef<int32_t> quantizedDimensions,
+    ArrayRef<int64_t> blockSizes, int64_t storageTypeMin,
+    int64_t storageTypeMax) {
+  return Base::getChecked(emitError, storageType.getContext(), flags,
+                          storageType, expressedType, scales, zeroPoints,
+                          quantizedDimensions, blockSizes, storageTypeMin,
+                          storageTypeMax);
+}
+
+LogicalResult UniformQuantizedSubChannelType::verifyInvariants(
+    function_ref<InFlightDiagnostic()> emitError, unsigned flags,
+    Type storageType, Type expressedType, DenseElementsAttr scales,
+    DenseElementsAttr zeroPoints, ArrayRef<int32_t> quantizedDimensions,
+    ArrayRef<int64_t> blockSizes, int64_t storageTypeMin,
+    int64_t storageTypeMax) {
+  if (failed(QuantizedType::verifyInvariants(emitError, flags, storageType,
+                                             expressedType, storageTypeMin,
+                                             storageTypeMax))) {
+    return failure();
+  }
+
+  // Uniform quantization requires fully expressed parameters, including
+  // expressed type.
+  if (!expressedType)
+    return emitError() << "uniform quantization requires expressed type";
----------------
sdasgup3 wrote:

There are two verification methods introduced for sub-channel quantization: (A)  `UniformQuantizedSubChannelType::verifyInvariants`, and (B) verifySubChannelQuantization, which are  complementary in nature. 

(A) is used for all the checks that could be performed at type level only (w/o knowing the container tensor type). Like
 - Expressed type is floating point.
 - Scale type to match expressedType.
 - Zero-point type to match storageType.
 - Shape of scales and zeroPoints match.
 -  number of quantized-dimensions and block-sizes match.
 -  quantized dimension >= 0
 -  blockSize > 0

(A) is invoked as part of `QuantizedType` ctor.

and (B) are the complementary checks  once we have information about the container type. (B) is invoked as part of quant dialect ops' verfication like `quant.qcast, quant.dcast` etc. Some example checks for (B) are:

 - container type should be a ranked tensor  type
 - dim(scales, i) = dims(container_type, i) / block_sizes(i) etc

https://github.com/llvm/llvm-project/pull/120172