[all-commits] [llvm/llvm-project] 81d7ee: Sub-channel quantized type implementation (#120172)

Sun Mar 23 05:38:16 PDT 2025

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 81d7eef13453f21303acfba773d0903b27ad754b
      https://github.com/llvm/llvm-project/commit/81d7eef13453f21303acfba773d0903b27ad754b
  Author: Sandeep Dasgupta <sdasgup at google.com>
  Date:   2025-03-23 (Sun, 23 Mar 2025)

  Changed paths:
    M mlir/include/mlir-c/Dialect/Quant.h
    M mlir/include/mlir/Dialect/Quant/IR/QuantBase.td
    M mlir/include/mlir/Dialect/Quant/IR/QuantDialectBytecode.td
    M mlir/include/mlir/Dialect/Quant/IR/QuantTypes.h
    M mlir/include/mlir/Dialect/Quant/Transforms/Passes.td
    M mlir/lib/Bindings/Python/DialectQuant.cpp
    M mlir/lib/CAPI/Dialect/Quant.cpp
    M mlir/lib/Dialect/Quant/IR/QuantDialectBytecode.cpp
    M mlir/lib/Dialect/Quant/IR/QuantOps.cpp
    M mlir/lib/Dialect/Quant/IR/QuantTypes.cpp
    M mlir/lib/Dialect/Quant/IR/TypeDetail.h
    M mlir/lib/Dialect/Quant/IR/TypeParser.cpp
    M mlir/lib/Dialect/Quant/Transforms/CMakeLists.txt
    M mlir/lib/Dialect/Quant/Transforms/LowerQuantOps.cpp
    A mlir/lib/Dialect/Quant/Transforms/NormalizeQuantTypes.cpp
    M mlir/python/mlir/_mlir_libs/_mlir/dialects/quant.pyi
    M mlir/test/CAPI/quant.c
    M mlir/test/Dialect/Quant/Bytecode/types.mlir
    M mlir/test/Dialect/Quant/invalid.mlir
    M mlir/test/Dialect/Quant/lower-quant-ops.mlir
    A mlir/test/Dialect/Quant/normalize-quant-types.mlir
    M mlir/test/Dialect/Quant/ops.mlir
    M mlir/test/Dialect/Quant/parse-uniform-invalid.mlir
    M mlir/test/Dialect/Quant/parse-uniform.mlir
    M mlir/test/python/dialects/quant.py

  Log Message:
  -----------
  Sub-channel quantized type implementation (#120172)

This is an implementation for [RFC: Supporting Sub-Channel Quantization
in
MLIR](https://discourse.llvm.org/t/rfc-supporting-sub-channel-quantization-in-mlir/82694).

In order to make the review process easier, the PR has been divided into
the following commit labels:

1. **Add implementation for sub-channel type:** Includes the class
design for `UniformQuantizedSubChannelType`, printer/parser and bytecode
read/write support. The existing types (per-tensor and per-axis) are
unaltered.
2. **Add implementation for sub-channel type:** Lowering of
`quant.qcast` and `quant.dcast` operations to Linalg operations.
3. **Adding C/Python Apis:** We first define he C-APIs and build the
Python-APIs on top of those.
4. **Add pass to normalize generic ....:** This pass normalizes
sub-channel quantized types to per-tensor per-axis types, if possible.

A  design note:
- **Explicitly storing the `quantized_dimensions`, even when they can be
derived for ranked tensor.**
While it's possible to infer quantized dimensions from the static shape
of the scales (or zero-points) tensor for ranked
data tensors
([ref](https://discourse.llvm.org/t/rfc-supporting-sub-channel-quantization-in-mlir/82694/3)
for background), there are cases where this can lead to ambiguity and
issues with round-tripping.

```
Consider the example: tensor<2x4x!quant.uniform<i8:f32:{0:2, 0:2}, {{s00:z00, s01:z01}}>>
```

The shape of the scales tensor is [1, 2], which might suggest that only
axis 1 is quantized. While this inference is technically correct, as the
block size for axis 0 is a degenerate case (equal to the dimension
size), it can cause problems with round-tripping. Therefore, even for
ranked tensors, we are explicitly storing the quantized dimensions.
Suggestions welcome!

PS: I understand that the upcoming holidays may impact your schedule, so
please take your time with the review. There's no rush.

To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications