[Mlir-commits] [mlir] [mlir] Improvements to the 'quant' dialect (PR #100667)
Rafael Ubal
llvmlistbot at llvm.org
Thu Sep 26 10:42:46 PDT 2024
================
@@ -0,0 +1,297 @@
+//===- QuantBase.td - Quantization dialect base ------------*- tablegen -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// Quantization dialect, types, and traits.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef QUANT_BASE
+#define QUANT_BASE
+
+include "mlir/IR/OpBase.td"
+
+def Quant_Dialect : Dialect {
+ let name = "quant";
+ let description = [{
+ The `quant` dialect offers a framework for defining and manipulating
+ quantized values. Central to this framework is the `!quant.uniform` data
+ type, used to represent quantized values. This dialect also provides a
+ suite of operations to handle and convert quantized values between their
+ original floating-point representations and the optimized, lower bit-width
+ integer representations. The `quant` dialect is instrumented with
+ transformation passes to lower these operations into other core MLIR
+ dialects, while also flattening all occurrences of quantized types into
+ their integer counterparts.
+
+
+ ## The `!quant.uniform` type
+
+ The quantization process establishes a relationship between two types of
+ values: an *expressed value* and a *stored value*. The former refers to the
+ floating-point representation used in an original machine learning model,
+ capturing the precise numerical characteristics needed for accurate
+ calculations. The latter is the simplified integer representation that
+ resides in memory after quantization. The `!quant.uniform` data type
+ encodes the necessary information for (lossy) round-trip conversion between
+ an expressed and a stored value.
+
+ The `quant.uniform` type has two variants: per-layer quantization and
+ per-channel (or per-axis) quantization. In per-layer quantization, the
+ quantization information affects an entire tensor uniformly. Conversely, in
+ per-channel quantization, the data type encodes the specific tensor axis
+ that serves as the channel and includes quantization information for each
+ individual channel within the tensor. Below are the specific syntactic and
+ semantic considerations for each modality.
+
+
+ ### Per-layer quantization
+
+ This is the general syntax of the `!quant.uniform` type representing
+ per-layer quantization:
+
+ ```
+ `!quant.uniform` `<`
+ storedType (`<` storageMin `:` storageMax `>`)? `:`
+ expressedType `,`
+ scale (`:` zeroPoint)?
+ `>`
+ ```
+
+ The type contains the following parameters:
+
+ - `storedType`: Integer type of the value stored in memory. This type
+ conveys the bit width and signedness of the quantized stored value.
+ Signed integer types are represented as `'i' bitWidth` (e.g., `i8`),
+ while unsigned integer types are represented as `'u' bitWidth` (e.g.,
+ `u8`).
+
+ - `storageMin`, `storageMax`: Optional bounds for the stored value. If
+ given, they must be within the range of `storedType`. If omitted, the
+ entire range of `storedType` is allowed (e.g., `-128...127` for `i8` or
+ `0...255` for `u8`).
+
+ - `expressedType`: Floating-point type of the value expressed by this
+ quantized type (e.g., `f32`, `f80`, `bf16`, or `tf32`).
+
+ - `scale`: Floating-point value of type `expressedType` used in the
----------------
rafaelubalmw wrote:
I have included the checks that verify that the scale must be within the range of the minimum and maximum representable floats in `expressedType`. I've added those checks both for `UniformQuantizedType` and `UniformPerAxisQuantizedType`.
I was going to add boundary checks for `zeroPoint` as done in the StableHLO code you're pointing to. However, I have concerns on the legitimacy of those checks. Whether zero points outside of the min/max storage values are reasonable depends on the quantization/dequantization algorithm. In fact, in the algorithm proposed here for the lowerings of `quant.qcast` and `quant.dcast`, the zero point is first converted to `expressedType` (see included TableGen documentation for the ops), and therefore it may be OK for them to be outside of the storage type range.
https://github.com/llvm/llvm-project/pull/100667
More information about the Mlir-commits
mailing list