[Mlir-commits] [mlir] [mlir][ArmNeon] Adds Arm Neon SMMLA, UMMLA, and USMMLA Intrinsics (PR #80511)
Benjamin Maxwell
llvmlistbot at llvm.org
Tue Feb 6 02:09:43 PST 2024
================
@@ -120,6 +120,99 @@ def SdotOp : ArmNeon_OverloadedOperandsWithOneResultIntrOp<"sdot",[1], [
"$a `,` $b `,` $c attr-dict `:` type($b) `,` type($c) `to` type($res)";
}
+def SmmlaOp : ArmNeon_OverloadedOperandsWithOneResultIntrOp<"smmla",[1], [
+ Pure,
+ AllTypesMatch<["src1", "src2"]>,
+ AllTypesMatch<["acc", "res"]>,
+ ]> {
+ let summary = "Matrix-matrix multiply and accumulate op";
+ let description = [{
+ SMMLA: Signed integer matrix multiply-accumulate.
+
+ Signed 8-bit integer matrix multiply-accumulate. This instruction multiplies
+ the 2x8 matrix of signed 8-bit integer values in the first source vector by
+ the 8x2 matrix of signed 8-bit integer values in the second source vector.
+ The resulting 2x2 32-bit integer matrix product is destructively added to
+ the 32-bit integer matrix accumulator in the destination vector. This is
+ equivalent to performing an 8-way dot product per destination element.
+
+ Source:
+ https://developer.arm.com/architectures/instruction-sets/intrinsics/#f:@navigationhierarchiessimdisa=[Neon]&q=smmla
+ }];
+ // Supports (vector<16xi8>, vector<16xi8>) -> (vector<4xi32>)
+ let arguments = (ins
+ VectorOfLengthAndType<[4], [I32]>:$acc,
+ VectorOfLengthAndType<[16], [I8]>:$src1,
+ VectorOfLengthAndType<[16], [I8]>:$src2
+ );
+ let results = (outs VectorOfLengthAndType<[4], [I32]>:$res);
+ let assemblyFormat =
+ "$acc `,` $src1 `,` $src2 attr-dict `:` type($src1) `to` type($res)";
+}
+
+def UmmlaOp : ArmNeon_OverloadedOperandsWithOneResultIntrOp<"ummla",[1], [
+ Pure,
+ AllTypesMatch<["src1", "src2"]>,
+ AllTypesMatch<["acc", "res"]>,
+ ]> {
+ let summary = "Unsinged matrix-matrix multiply and accumulate op";
+ let description = [{
+ UMMLA: Signed integer matrix multiply-accumulate.
+
+ Unsigned 8-bit integer matrix multiply-accumulate. This instruction
+ multiplies the 2x8 matrix of unsigned 8-bit integer values in the first
+ source vector by the 8x2 matrix of unsigned 8-bit integer values in the
+ second source vector. The resulting 2x2 32-bit integer matrix product is
+ destructively added to the 32-bit integer matrix accumulator in the
+ destination vector. This is equivalent to performing an 8-way dot product
+ per destination element.
+
+ Source:
+ https://developer.arm.com/architectures/instruction-sets/intrinsics/#f:@navigationhierarchiessimdisa=[Neon]&q=ummla
+ }];
+ // Supports (vector<16xi8>, vector<16xi8>) -> (vector<4xi32>)
+ let arguments = (ins
+ VectorOfLengthAndType<[4], [I32]>:$acc,
+ VectorOfLengthAndType<[16], [I8]>:$src1,
+ VectorOfLengthAndType<[16], [I8]>:$src2
+ );
+ let results = (outs VectorOfLengthAndType<[4], [I32]>:$res);
+ let assemblyFormat =
+ "$acc `,` $src1 `,` $src2 attr-dict `:` type($src1) `to` type($res)";
+}
+
+def UsmmlaOp : ArmNeon_OverloadedOperandsWithOneResultIntrOp<"usmmla",[1], [
+ Pure,
+ AllTypesMatch<["src1", "src2"]>,
+ AllTypesMatch<["acc", "res"]>,
+ ]> {
+ let summary = "Unsignged and signed matrix-matrix multiply and accumulate op";
+ let description = [{
+ USMMLA: Signed integer matrix multiply-accumulate.
+
+ Unsigned and signed 8-bit integer matrix multiply-accumulate. This
+ instruction multiplies the 2x8 matrix of unsigned 8-bit integer values in
+ the first source vector by the 8x2 matrix of signed 8-bit integer values in
+ the second source vector. The resulting 2x2 32-bit integer matrix product is
+ destructively added to the 32-bit integer matrix accumulator in the
+ destination vector. This is equivalent to performing an 8-way dot product
+ per destination element.
+
+
+ Source:
+ https://developer.arm.com/architectures/instruction-sets/intrinsics/#f:@navigationhierarchiessimdisa=[Neon]&q=usmmla
+ }];
+ // Supports (vector<16xi8>, vector<16xi8>) -> (vector<4xi32>)
+ let arguments = (ins
+ VectorOfLengthAndType<[4], [I32]>:$acc,
+ VectorOfLengthAndType<[16], [I8]>:$src1,
+ VectorOfLengthAndType<[16], [I8]>:$src2
----------------
MacDue wrote:
Note: `VectorOfLengthAndType` is more permissive than you may expect.
`vector<4x2x2x1xi8` is a vector of length 16 of type `i8`, more surprisingly so it any variation on scalability, so `vector<2x[8]xi8>` is also a vector of length 16 and type `i8`.
https://github.com/llvm/llvm-project/pull/80511
More information about the Mlir-commits
mailing list