[Mlir-commits] [mlir] [mlir][amdgpu] Add scaled_ext_packed{8, 16} operations (PR #159830)
Erick Ochoa Lopez
llvmlistbot at llvm.org
Fri Oct 17 08:45:37 PDT 2025
================
@@ -112,6 +112,73 @@ def AMDGPU_ExtPackedFp8Op :
}];
}
+def IsValidBlockSize: AttrConstraint<
+ CPred<"::llvm::cast<::mlir::IntegerAttr>($_self).getInt() == 16 || ::llvm::cast<::mlir::IntegerAttr>($_self).getInt() == 32">,
+ "whose value is 16 or 32">;
+
+
+def Vector4Scales :
+ AllOfType<[FixedVectorOfLengthAndType<[4], [F8E8M0FNU]>],
+ "vector of 4 F8E8M0FNU scales",
+ "::mlir::VectorType">,
+ BuildableType<"::mlir::VectorType::get({4}, $_builder.getType<::mlir::Float8E8M0FNUType>());">;
+
+def AMDGPU_ScaledExtPacked816Op
+ : AMDGPU_Op<"scaled_ext_packed816", [Pure]>,
+ Arguments<(
+ ins AnyTypeOf<[VectorOfLengthAndType<[8], [F4E2M1FN,F8E4M3FN,F8E5M2]>,
+ VectorOfLengthAndType<[16], [F6E2M3FN, F6E3M2FN]>]>:$source,
+ Vector4Scales:$scale,
+ ConfinedAttr<I32Attr, [IsValidBlockSize]>:$blockSize,
+ ConfinedAttr<I32Attr, [IntMinValue<0>, IntMaxValue<1>]>:$firstScaleLane,
+ ConfinedAttr<I32Attr, [IntMinValue<0>, IntMaxValue<2>]>:$firstScaleByte)>,
+ Results<(
+ outs AnyTypeOf<[FixedVectorOfLengthAndType<[8], [F32]>,
+ FixedVectorOfLengthAndType<[8], [F16]>,
+ FixedVectorOfLengthAndType<[8], [BF16]>,
+ FixedVectorOfLengthAndType<[16], [F32]>,
+ FixedVectorOfLengthAndType<[16], [F16]>,
+ FixedVectorOfLengthAndType<[16], [BF16]>]>:$res)> {
+
+ let summary = "Extend a vector of packed floating point values";
+
+ let description = [{
+ The scales applied to the input microfloats are stored in two bytes which
+ come from the `scales` input provided in a *half* of the wave identified
+ by `firstScaleLane`. The pair of bytes used is selected by
+ `firstScaleByte`. The 16 vectors in consecutive lanes starting from
+ `firstScaleLane` (which we'll call the scale vectors) will be used by both
+ halves of the wave (with lane L reading from L % 16'th scale vector), but
+ each half will use a different byte.
+
+ When the block size is 32, `firstScaleByte` can be either 0 or 2,
+ selecting halves of the scale vectors. Lanes 0-15 will read from
+ `firstScaleByte` and lanes 16-31 will read from `firstScaleByte` + 1.
+
+ However, when the block size is 16, `firstScaleByte` can be 0 or 1.
+ Lanes 0-15 read from the `firstScaleByte`th element of the scale vectors,
+ while lanes 16-31 read from `firstScaleByte` + 2.
----------------
amd-eochoalo wrote:
https://github.com/llvm/llvm-project/pull/159830/commits/03831000cc808490624ec751624f3a5437c2f9df Thanks!
https://github.com/llvm/llvm-project/pull/159830
More information about the Mlir-commits
mailing list