[Mlir-commits] [mlir] [mlir][amdgpu] Add scaled_ext_packed{8, 16} operations (PR #159830)

Fri Oct 17 08:49:44 PDT 2025

================
@@ -150,10 +150,50 @@ def AMDGPU_ScaledExtPacked816Op
     When the block size is 32, `firstScaleByte` can be either 0 or 2,
     selecting halves of the scale vectors. Lanes 0-15 will read from
     `firstScaleByte` and lanes 16-31 will read from `firstScaleByte` + 1.
+    For example:
+    ```mlir
+    // Input: 8-element vector of F8E4M3FN, converting to F32
+    // Lanes 0-15 read from byte 0, lanes 16-31 read from byte 1
+    %result = amdgpu.scaled_ext_packed816 %source
+    scale(%scales)
+    blockSize(32)
+    firstScaleLane(0)
+    firstScaleByte(0)
+    : vector<8xf8E4M3FN>, vector<4xf8E8M0FNU> -> vector<8xf32>
----------------
kuhar wrote:

nit: maybe indent and fold this a bit?
```suggestion
    %result = amdgpu.scaled_ext_packed816 %source scale(%scales)
      blockSize(32) firstScaleLane(0) firstScaleByte(0)
      : vector<8xf8E4M3FN>, vector<4xf8E8M0FNU> -> vector<8xf32>
```

https://github.com/llvm/llvm-project/pull/159830