[Mlir-commits] [mlir] [mlir] `im2col` & `l2cache` on cp.async.bulk.tensor.shared.cluster.global` (PR #72967)
llvmlistbot at llvm.org
llvmlistbot at llvm.org
Tue Nov 21 00:45:56 PST 2023
================
@@ -1404,20 +1404,34 @@ def NVVM_CpAsyncBulkTensorGlobalToSharedClusterOp :
AttrSizedOperandSegments]>,
Arguments<(ins LLVM_PointerShared:$dstMem,
LLVM_AnyPointer:$tmaDescriptor,
- LLVM_PointerShared:$mbar,
- Optional<I16>:$multicastMask,
Variadic<I32>:$coordinates,
+ LLVM_PointerShared:$mbar,
+ Variadic<I16>:$im2colOffsets,
+ Optional<I16>:$multicastMask,
+ Optional<I64>:$l2CacheHint,
PtxPredicate:$predicate)> {
let description = [{
Initiates an asynchronous copy operation on the tensor data from global
memory to shared memory.
+ The Op operates has two load modes:
+ 1) Tiled Mode: It's the default mode. The source multi-dimensional tensor
+ layout is preserved at the destination.
+
+ 2) Im2col Mode: This mode is used when `im2colOffsets` operands are present.
+ the elements in the Bounding Box of the source tensor are rearranged into
+ columns at the destination. In this mode, the tensor has to be at least
+ 3-dimensional.
+
The `multicastMask` operand is optional. When it is present, the Op copies
data from global memory to shared memory of multiple CTAs in the cluster.
Operand `multicastMask` specifies the destination CTAs in the cluster such
that each bit position in the 16-bit `multicastMask` operand corresponds to
- the `nvvm.read.ptx.sreg.ctaid` of the destination CTA.
+ the `nvvm.read.ptx.sreg.ctaid` of the destination CTA.
+ The `l2CacheHint` operand is optinal, and it is used to specify cache
----------------
durga4github wrote:
Sorry for the nit,
'optional' typo
https://github.com/llvm/llvm-project/pull/72967
More information about the Mlir-commits
mailing list