[Mlir-commits] [mlir] [MLIR][XeGPU][Conversion] Add 2D block op support for sub byte types (PR #169099)

Thu Dec 4 13:52:27 PST 2025

================
@@ -268,9 +276,64 @@ class LoadStorePrefetchNdToXeVMPattern : public OpConversionPattern<OpType> {
           op, "Expected offset rank to match descriptor rank.");
     auto elemType = tdescTy.getElementType();
     auto elemBitSize = elemType.getIntOrFloatBitWidth();
-    if (elemBitSize % 8 != 0)
+    bool isSubByte = elemBitSize < 8;
+    uint64_t wScaleFactor = 1;
+
+    if (!isSubByte && (elemBitSize % 8 != 0))
       return rewriter.notifyMatchFailure(
           op, "Expected element type bit width to be multiple of 8.");
+    auto tileW = tdescTy.getDimSize(tileRank - 1);
+    // For sub byte types, only 4bits are currently supported.
+    if (isSubByte) {
+      if (elemBitSize != 4)
+        return rewriter.notifyMatchFailure(
+            op, "Only sub byte types of 4bits are supported.");
+      if (tileRank != 2)
+        return rewriter.notifyMatchFailure(
+            op, "Sub byte types are only supported for 2D tensor descriptors.");
+      auto subByteFactor = 8 / elemBitSize;
+      auto sub16BitFactor = subByteFactor * 2;
+      auto sub32BitFactor = sub16BitFactor * 2;
+      auto tileH = tdescTy.getDimSize(0);
+      if (tileW == executionSize * sub16BitFactor) {
+        // Usage case for loading as Matrix A operand
+        // Emulate with 16bit loads/stores.
+        //   scaled_tileW = executionSize
+        elemType = rewriter.getIntegerType(16);
+        tileW = executionSize;
+        wScaleFactor = sub16BitFactor;
+      } else if (tileW == executionSize * sub32BitFactor) {
+        // Usage case for loading as pre-packed Matrix B operand
----------------
Jianhui-Li wrote:

I don't see there is such use case. 4-bit activation/weight are presented as 8-bit (packed from 2 elements of K dim). So suggest to remove this section as discussed. 

https://github.com/llvm/llvm-project/pull/169099