[Mlir-commits] [mlir] [mlir][amdgpu] Add scaled_ext_packed{8, 16} operations (PR #159830)

Fri Oct 17 08:50:03 PDT 2025

================
@@ -150,10 +150,50 @@ def AMDGPU_ScaledExtPacked816Op
     When the block size is 32, `firstScaleByte` can be either 0 or 2,
     selecting halves of the scale vectors. Lanes 0-15 will read from
     `firstScaleByte` and lanes 16-31 will read from `firstScaleByte` + 1.
+    For example:
+    ```mlir
+    // Input: 8-element vector of F8E4M3FN, converting to F32
+    // Lanes 0-15 read from byte 0, lanes 16-31 read from byte 1
+    %result = amdgpu.scaled_ext_packed816 %source
+    scale(%scales)
+    blockSize(32)
+    firstScaleLane(0)
+    firstScaleByte(0)
+    : vector<8xf8E4M3FN>, vector<4xf8E8M0FNU> -> vector<8xf32>
+
+    // Input: 16-element vector of F6E2M3FN, converting to F16
+    // Lanes 0-15 read from byte 2, lanes 16-31 read from byte 3
+    %result = amdgpu.scaled_ext_packed816 %source
+    scale(%scales)
+    blockSize(32)
+    firstScaleLane(1)
+    firstScaleByte(2)
+    : vector<16xf6E2M3FN>, vector<4xf8E8M0FNU> -> vector<16xf16>
+    ```
 
     However, when the block size is 16, `firstScaleByte` can be 0 or 1.
     Lanes 0-15 read from the `firstScaleByte`th element of the scale vectors,
     while lanes 16-31 read from `firstScaleByte` + 2.
+    For example:
+    ```
----------------
kuhar wrote:

start this with `mlir` to enable syntax highlighting

https://github.com/llvm/llvm-project/pull/159830