[Mlir-commits] [mlir] [MLIR][NVVM] Add support for dp4a instructions (PR #139043)

Guray Ozen llvmlistbot at llvm.org
Thu May 8 03:56:34 PDT 2025


================
@@ -3444,6 +3444,54 @@ def NVVM_Tcgen05StOp : NVVM_Op<"tcgen05.st"> {
   let hasVerifier = 1;
 }
 
+//===----------------------------------------------------------------------===//
+// NVVM dp4a Op
+//===----------------------------------------------------------------------===//
+
+def NVVM_Dp4aOp : NVVM_Op<"dp4a"> {
+  let summary = "Four-way byte dot product-accumulate instruction.";
+  let description = [{
+    Performs a four-way byte dot-product which is accumulated in a 32-bit
+    result.
+    Operand `a` and `b` can be passed either as packed 32-bit inputs holding
+    4 byte-inputs for the dot product, or as vectors of 4 i8 elements.
+    The `a_signed` and `b_signed` unit attributes specify whether the
+    individual byte inputs in operands `a` and `b` are signed or unsigned
+    respectively.
+    Operand `c` is a 32-bit integer to which the result is accumulated. It is
+    treated as holding a signed integer if any of `a` or `b` are signed.
+    
+    [For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/#integer-arithmetic-instructions-dp4a)
+  }];
+  
+  let arguments = (ins
+    AnyTypeOf<[I32, VectorOfLengthAndType<[4], [I8]>]>:$a,
+    AnyTypeOf<[I32, VectorOfLengthAndType<[4], [I8]>]>:$b,
+    I32:$c,
+    DefaultValuedAttr<UnitAttr, "false">:$a_signed,
+    DefaultValuedAttr<UnitAttr, "false">:$b_signed
----------------
grypp wrote:

Right adding 4 OPs is overkill. 
Then maybe we can use explicit types rather than UnitAttr:
```
%2 = nvvm.dp4a %a : s8, %b : u8, %c: vector<4xi8>, vector<4xi8>
```

https://github.com/llvm/llvm-project/pull/139043


More information about the Mlir-commits mailing list