[Mlir-commits] [mlir] [MLIR][XeGPU][Conversion] Add 2D block op support for sub byte types (PR #169099)
Jianhui Li
llvmlistbot at llvm.org
Thu Dec 4 13:52:27 PST 2025
================
@@ -268,9 +276,64 @@ class LoadStorePrefetchNdToXeVMPattern : public OpConversionPattern<OpType> {
op, "Expected offset rank to match descriptor rank.");
auto elemType = tdescTy.getElementType();
auto elemBitSize = elemType.getIntOrFloatBitWidth();
- if (elemBitSize % 8 != 0)
+ bool isSubByte = elemBitSize < 8;
+ uint64_t wScaleFactor = 1;
+
+ if (!isSubByte && (elemBitSize % 8 != 0))
return rewriter.notifyMatchFailure(
op, "Expected element type bit width to be multiple of 8.");
+ auto tileW = tdescTy.getDimSize(tileRank - 1);
+ // For sub byte types, only 4bits are currently supported.
+ if (isSubByte) {
+ if (elemBitSize != 4)
+ return rewriter.notifyMatchFailure(
+ op, "Only sub byte types of 4bits are supported.");
+ if (tileRank != 2)
+ return rewriter.notifyMatchFailure(
+ op, "Sub byte types are only supported for 2D tensor descriptors.");
+ auto subByteFactor = 8 / elemBitSize;
+ auto sub16BitFactor = subByteFactor * 2;
+ auto sub32BitFactor = sub16BitFactor * 2;
+ auto tileH = tdescTy.getDimSize(0);
+ if (tileW == executionSize * sub16BitFactor) {
+ // Usage case for loading as Matrix A operand
+ // Emulate with 16bit loads/stores.
+ // scaled_tileW = executionSize
+ elemType = rewriter.getIntegerType(16);
+ tileW = executionSize;
+ wScaleFactor = sub16BitFactor;
+ } else if (tileW == executionSize * sub32BitFactor) {
+ // Usage case for loading as pre-packed Matrix B operand
----------------
Jianhui-Li wrote:
I don't see there is such use case. 4-bit activation/weight are presented as 8-bit (packed from 2 elements of K dim). So suggest to remove this section as discussed.
https://github.com/llvm/llvm-project/pull/169099
More information about the Mlir-commits
mailing list