[Mlir-commits] [mlir] [MLIR-NVVMDialect]: Add cp.async.mbarrier.arrive Op (PR #74241)
llvmlistbot at llvm.org
llvmlistbot at llvm.org
Mon Dec 4 06:00:15 PST 2023
================
@@ -592,6 +592,42 @@ def NVVM_CpAsyncWaitGroupOp : NVVM_Op<"cp.async.wait.group">,
let assemblyFormat = "$n attr-dict";
}
+def NVVM_CpAsyncMBarrierArriveOp : NVVM_Op<"cp.async.mbarrier.arrive"> {
+ let summary = "NVVM Dialect Op for cp.async.mbarrier.arrive";
+ let description = [{
+ The `cp.async.mbarrier.arrive` Op makes the mbarrier object track
+ all prior cp.async operations initiated by the executing thread.
+ The `addr` operand specifies the address of the mbarrier object
+ in generic or shared memory space. The `noinc` attr impacts how
+ how the mbarrier's state is updated.
+ [For more information, refer PTX ISA]
+ (https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-cp-async-mbarrier-arrive)
+ }];
+ let assemblyFormat = "$addr attr-dict `:` type(operands)";
+
+ let arguments = (ins
+ AnyTypeOf<[LLVM_AnyPointer, LLVM_PointerShared]>:$addr,
+ DefaultValuedAttr<I1Attr, "0">:$noinc);
+
+ string llvmBuilder = [{
+ unsigned addressSpace =
+ llvm::cast<LLVM::LLVMPointerType>(op.getAddr().getType()).getAddressSpace();
+ bool isShared = addressSpace == NVVM::kSharedMemorySpace;
+
+ llvm::Intrinsic::ID intId;
+ if ($noinc) {
+ intId = isShared ?
+ llvm::Intrinsic::nvvm_cp_async_mbarrier_arrive_noinc_shared :
+ llvm::Intrinsic::nvvm_cp_async_mbarrier_arrive_noinc;
+ } else {
+ intId = isShared ?
+ llvm::Intrinsic::nvvm_cp_async_mbarrier_arrive_shared :
+ llvm::Intrinsic::nvvm_cp_async_mbarrier_arrive;
+ }
+ createIntrinsicCall(builder, intId, {$addr});
+ }];
+}
+
----------------
durga4github wrote:
This was exactly the item I wanted to get our preference on!
I did notice that the MBarrier Ops in NVVM has 2 implementations depending on shared/generic.
However, I admit I did not look into the NVGPU side of things. Looking at the lowering code you pointed to, it makes sense to abstract (or split) in only one place/Dialect. I am also divided on where we _should_ do it and why one would be better than the other.
In any case, I don't want to create a diverged implementation within the same (MBarrier) family of Ops. So, I will refresh the patch with a split implementation.
(Another thought: At the intrinsics level, there are separate implementations for generic and shared. This way everything below NVGPU has separate Ops, with NVGPU being the layer of abstraction).
https://github.com/llvm/llvm-project/pull/74241
More information about the Mlir-commits
mailing list