[Mlir-commits] [mlir] Extending UniformQuantizedType with interface-based support for new storage types in Quant dialect (PR #152966)
Anurag Singh
llvmlistbot at llvm.org
Wed Aug 13 23:04:14 PDT 2025
anuragsingh-tt wrote:
Great discussion.
We have a similar approach to @ZoranZomborat for static/dynamic quantization in our compiler in that the storage type is separated during lowering and quantization parameters (scales, LUTs, etc.) are handled at the op/type level. I agree with the idea that storage type should just be a clean carrier that describes how the bits are stored (signedness, width, packing). In the NF4 case described above, the LUT could be an operand in the dynamic case and an attribute in the static case.
To make new storage types easy to use in generic passes and legal for different backends, maybe it would be helpful if the interface exposed a few basic packing/alignment facts:
• `bool isPacked()` as suggested
• `unsigned getLogicalBitWidth()` (e.g., 4 for NF4)
• `unsigned getElementsPerByte()` (e.g., 2 for NF4)
• `Optional<unsigned> getPreferredAlignmentBytes()` (for DMA/vector/tile alignment)
These allow MLIR transforms to compute correct byte strides, respect sub-byte legality rules and insert pack/unpack only where necessary. I realize some of these may lean toward backend implementation details but even simple defaults would let generic passes make safe decisions while still allowing hardware-specific encodings to refine them.
BTW, there are also lower-bit formats like i2 and i1 gaining traction.
• For ternary, the “canonical” compact encoding is 2 bits per value so `isPacked=true`, `bitWidth=2`, `elementsPerByte=4`. Though if a backend stores ternary as signed int8 `(−1,0,+1)` for simplicity it would be `isPacked=false`, `bitWidth=8`, `elementsPerByte=1`. The interface makes both legal and the layout/encodings decide which is used where.
Misc. point: If someone brings up 6‑bit (`fp6`) formats those generally require a composite packing scheme (ex: 4+2 bit streams) and don’t divide 8 cleanly. That case argues for either (a) a richer packing descriptor than just `elementsPerByte`, or (b) modeling fp6 as a composite/“MX” type (as discussed above) rather than a single simple carrier. So these helpers might be useful for the common divisors {1,2,4,8} but outliers would need a separate path.
https://github.com/llvm/llvm-project/pull/152966
More information about the Mlir-commits
mailing list